Back to all community articles

Validity of Financial Statements: Benford's Law

Open as TemplateView PDF

Author:

Ivaan Shrestha

Last Updated:

9 years ago

License:

Creative Commons CC BY 4.0

Abstract:

Benford law states that the occurrence of digits from 0-9 in a large set of data is not uniformly distributed but instead in a decreasing logarithmic distribution with 1 occurring at most. Almost all set of data follows this trend however this law is widely used as a base for various fraud detection and forensic accounting. Benford’s law is an observation that leading digits in data derived from measurements doesn’t follow uniform distribution. Different financial statements such as cash flows, income statement and balance sheet of the 20 tech companies of the Fortune 500 are analyzed in this project. Cash flow is the net amount of cash and cash-equivalents moving into and out of a business. Income statement is a financial statement that measures a company's financial performance over a specific accounting period. Balance sheet is a financial statement that summarizes a company's assets, liabilities and shareholders’ equity at a specific point in time. All of these data of financial statements are extracted from Morning Star database and are analyzed by Python program written by me.I also wrote the Python program to calculate Benford's second digit and third digit probability using the formula. I would like to thank Prof. Erin Wagner and Dr. Courtney Taylor for helping in this research project.

Tags:

Validity of Financial Statements: Benford's Law

\begin{now}

Discover why over 20 million people worldwide trust Overleaf with their work.

Sign up for free Explore all plans

%% This is file `elsarticle-template-1-num.tex',
%%
%% Copyright 2009 Elsevier Ltd
%%
%% This file is part of the 'Elsarticle Bundle'.
%% ---------------------------------------------
%%
%% It may be distributed under the conditions of the LaTeX Project Public
%% License, either version 1.2 of this license or (at your option) any
%% later version.  The latest version of this license is in
%%    http://www.latex-project.org/lppl.txt
%% and version 1.2 or later is part of all distributions of LaTeX
%% version 1999/12/01 or later.
%%
%% Template article for Elsevier's document class `elsarticle'
%% with numbered style bibliographic references
%%
%% $Id: elsarticle-template-1-num.tex 149 2009-10-08 05:01:15Z rishi $
%% $URL: http://lenova.river-valley.com/svn/elsbst/trunk/elsarticle-template-1-num.tex $
%%
\documentclass[preprint,12pt]{elsarticle}

%% Use the option review to obtain double line spacing
%% \documentclass[preprint,review,12pt]{elsarticle}

%% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn
%% for a journal layout:
%% \documentclass[final,1p,times]{elsarticle}
%% \documentclass[final,1p,times,twocolumn]{elsarticle}
%% \documentclass[final,3p,times]{elsarticle}
%% \documentclass[final,3p,times,twocolumn]{elsarticle}
%% \documentclass[final,5p,times]{elsarticle}
%% \documentclass[final,5p,times,twocolumn]{elsarticle}

%% The graphicx package provides the includegraphics command.
\usepackage{graphicx}
%% The amssymb package provides various useful mathematical symbols
\usepackage{amssymb}
%% The amsthm package provides extended theorem environments
%% \usepackage{amsthm}

%% The lineno packages adds line numbers. Start line numbering with
%% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
%% for the whole article with \linenumbers after \end{frontmatter}.


%% natbib.sty is loaded by default. However, natbib options can be
%% provided with \biboptions{...} command. Following options are
%% valid:

%%   round  -  round parentheses are used (default)
%%   square -  square brackets are used   [option]
%%   curly  -  curly braces are used      {option}
%%   angle  -  angle brackets are used    <option>
%%   semicolon  -  multiple citations separated by semi-colon
%%   colon  - same as semicolon, an earlier confusion
%%   comma  -  separated by comma
%%   numbers-  selects numerical citations
%%   super  -  numerical citations as superscripts
%%   sort   -  sorts multiple citations according to order in ref. list
%%   sort&compress   -  like sort, but also compresses numerical citations
%%   compress - compresses without sorting
%%
%% \biboptions{comma,round}

% \biboptions{}

\journal{Journal Name}

\begin{document}

\begin{frontmatter}

%% Title, authors and addresses

\title{Validity of Financial Statements: Benford's Law}

%% use the tnoteref command within \title for footnotes;
%% use the tnotetext command for the associated footnote;
%% use the fnref command within \author or \address for footnotes;
%% use the fntext command for the associated footnote;
%% use the corref command within \author for corresponding author footnotes;
%% use the cortext command for the associated footnote;
%% use the ead command for the email address,
%% and the form \ead[url] for the home page:
%%
%% \title{Title\tnoteref{label1}}
%% \tnotetext[label1]{}
%% \author{Name\corref{cor1}\fnref{label2}}
%% \ead{email address}
%% \ead[url]{home page}
%% \fntext[label2]{}
%% \cortext[cor1]{}
%% \address{Address\fnref{label3}}
%% \fntext[label3]{}


%% use optional labels to link authors explicitly to addresses:
%% \author[label1,label2]{<author name>}
%% \address[label1]{<address>}
%% \address[label2]{<address>}

\author{Ivaan Shrestha}

\address{Indiana, United States}

\begin{abstract}
%% Text of abstract
Benford law states that the occurrence of digits from 0-9 in a large set of data is not uniformly distributed but instead in a decreasing logarithmic distribution with 1 occurring at most. Almost all set of data follows this trend however this law is widely used as a base for various fraud detection and forensic accounting. Benford’s law is an observation that leading digits in data derived from measurements doesn’t follow uniform distribution. Different financial statements such as cash flows, income statement and balance sheet of the 20 tech companies of the Fortune 500 are analyzed in this project. Cash flow is the net amount of cash and cash-equivalents moving into and out of a business. Income statement is a financial statement that measures a company's financial performance over a specific accounting period. Balance sheet is a financial statement that summarizes a company's assets, liabilities and shareholders’ equity at a specific point in time. All of these data of financial statements are extracted from Morning Star database and are analyzed by Python program written by me.I also wrote the Python program to calculate Benford's second digit and third digit probability using the formula. I would like to thank Prof. Erin Wagner and Dr. Courtney Taylor for helping in this research project.
\end{abstract}

\end{frontmatter}

%%
%% Start line numbering here if you want
%%

%% main text
\section{Introduction}
\label{S:1}

The first known reference to the logarithmic distribution of this phenomenon dates back to 1881, when the American astronomer Simon Newcomb noticed how much faster the first pages wear out than the last ones\cite{benford1938law}. After some 50 years, physicist Frank Benford rediscovered this law and supported it with more than 20,000 phenomenon happenings such as heats of chemical compounds, baseball statistics, paper and newspapers. Some may have argued that the Benford manipulated round-off errors to obtain better fit, but the without the manipulation also data were really close. Later in 1938 this was published as is known as Benford’s law. Some of the areas where the numerical data do not follow this trend are telephone numbers as it starts with particular digit, lotteries are distributed uniformly, heights of human adults, square root tables of integers and so forth \cite{singleton2011understanding}

\begin{figure}[h]
\centering\includegraphics[width=0.8\linewidth]{Images/image.jpg}
\caption{Benford’s original data from; reprinted courtesy of the American Philosophical Society}
\end{figure}
A set of numbers is said to satisfy Benford's law if the leading digit d  occurs with probability
\begin{equation}
P(d) = log_{10}(d +1)-log_{10}(d) - log_{10}(\frac{d+1}{d}) = log_{10}(1+\frac{1}{d}) 
\end{equation} \cite{durtschi2004effective}
\begin{center}
 d is in (1,2,.....,9)
\end{center}

\begin{center}
For example: In 65789, leading digit is 6 whereas in number 112489 leading digit is 1.
The figure below helps to understand:
\end{center}
\begin{figure}[h]
\centering\includegraphics[width=0.6\linewidth]{Images/firstBenford.jpg}
\caption{A table showing d as Digit and probability P(d) as First Place\cite{benford1938law}}
\end{figure}
Here you can see that there is 0.301 probability of 1 being a first digit, 0.176 probability of 2 being a first digit and the probability decreases as the number increases where at the end probability of 9 being a first digit is 0.046.Here are the list of tech companies that were used in my research:
\begin{table}[h]
\centering
\begin{tabular}{l l l}
\hline
\textbf{Company} & \textbf{Company} & \textbf{Company} \\
\hline
Amazon & Apple & Western Digital \\
Cisco Systems & Computer Science Corps & Qualcomm \\
Danaher & eBay & Texas Instruments \\
EMC & Thermo Fisher Scientific & Xerox\\
Google & HPQ & Microsoft\\
IBM & Intel & Oracle\\
Jabil Circuit & Micron Technology \cite{Fortune} \\
\hline
\end{tabular}
\caption{Companies that were analyzed in this research}
\end{table}

\section{Method}
\label{s:2}

The top 20 tech companies from Fortune 500 are used in this project and the list can be found in the table above. The financial statements of these companies were retrieved from the Morningstar database provided by Investment Research Center.There are three kinds of financial statement that are included in this project. The Cash flow statement shows net amount of cash and cash-equivalents moving into and out of a business. Income statement is a financial statement that measures a company's financial performance over a specific accounting period. Balance sheet is a financial statement that summarizes a company's assets, liabilities and shareholders’ equity at a specific point in time. Using the Excel data from web each of these statements are extracted in the excel format. For each of these statements the data analyzed are from 2005 to 2014, and are in each column of the excel spreadsheet.
The format of data is that each sheet will have particular kind of statement(income, balance sheet or cash flow statement) which will hold the data from 2005 to 2014, whereas each file will have all the statements for particular company. The Python program will extract each cell from each column from sheet and will convert it into string where it will read occurrence of each first digit, second digit and third digit and store the values in the variable. Once we have these values, each of these values are stored in excel sheet where all the company's first digit, second digit and third digit will be added and series of plot and histograms were made.



\section{Results}
\label{s:3}
The data of all the financial statements were analyzed by the method described in the method section.
\subsection{First Digit}
At first all the data from all the financial statements of each company were analyzed with first leading digit of the numbers. There were in total of 15,903
numbers and first digit of each number was analyzed and  according to the number the  parameter counter was increased.If the number was negative the Python program would replace the first string with the second string. Here is the result for the first digit for each number in statements.
\hfill \pagebreak
\begin{figure}[h!]
\centering\includegraphics[width=0.8\linewidth]{Images/firstDigit5.jpg}
\caption{Figure of table showing the observation for leading first digit \cite{Morning}}
\end{figure}
\hfill \linebreak
\begin{figure}[h!]
\centering\includegraphics[width=0.8\linewidth]{Images/firstDigit3.jpg}
\caption{Histogram comparing Benford's probability against actual observation}
\end{figure}
\hfill \pagebreak

\begin{figure}[h!]
\centering\includegraphics[width=0.8\linewidth]{Images/firstDigit4.jpg}
\caption{Graph comparing Benford's probability against actual Observation}
\end{figure}
The above graph will be able to provide exact differences between the observed and Benford's law.
\begin{figure}[h!]
\centering\includegraphics[width=0.8\linewidth]{Images/firstDigit1.jpg}
\caption{Histogram comparing uniform distribution against actual observation}
\end{figure}
\hfill \pagebreak
\subsection{Second Digit}
There were total of 15,537 numbers analyzed for second leading digits in all financial statements of all company.
\begin{figure}[h!]
\centering\includegraphics[width=0.7\linewidth]{Images/secondDigit4.jpg}
\caption{Table showing the second leading digits observation}
\end{figure}
\begin{figure}[h!]
\centering\includegraphics[width=0.7\linewidth]{Images/secondDigit2.jpg}
\caption{Histogram comparing uniform distribution against actual observation}
\end{figure}
\begin{figure}[h!]
\centering\includegraphics[width=0.7\linewidth]{Images/secondDigit3.jpg}
\caption{Histogram comparing Benford's law against actual observation}
\end{figure}
\begin{figure}[h!]
\centering\includegraphics[width=0.7\linewidth]{Images/secondDigit1.jpg}
\caption{Graph comparing uniform distribution against actual Observation against Benford's law for leading second digit }
\end{figure}

We can observe in this section that the digit 0 has also been added in both Benford's law and also in my comparison. The observation almost follows the Benford's law.
\hfill \pagebreak

\subsection{Third Digit}
There were total of 14,193 numbers analyzed for third leading digits in all financial statements of all company.
\begin{figure}[h!]
\centering\includegraphics[width=0.6\linewidth]{Images/thirdDigit3.jpg}
\caption{Table showing the Benford's probability and Observed frequency for third digit. }
\end{figure}

\begin{figure}[h!]
\centering\includegraphics[width=0.7\linewidth]{Images/thirdDigit1.jpg}
\caption{Graph comparing actual Observation against Benford's law}
\end{figure}
\begin{figure}[h!]
\centering\includegraphics[width=0.7\linewidth]{Images/thirdDigit.jpg}
\caption{Histogram comparing uniform distribution against actual observation for third leading digit}
\end{figure}

\begin{figure}[h!]
\centering\includegraphics[width=0.7\linewidth]{Images/thirdDigit2.jpg}
\caption{Graph comparing uniform distribution against actual Observation against Benford's law for third leading digit}
\end{figure}
The above table helps to closely analyze the differences between frequency probability of third leading digit for actual observation against Benford's law probability frequency against uniform distribution.


\hfill \pagebreak
\subsection{Small Data}
I have added this section in my research to understand if Bendford's law holds for first leading digit if the data set gets smaller and smaller. Here are the results.
\hfill \linebreak
\begin{figure}[h!]
\centering\includegraphics[width=0.7\linewidth]{Images/small.jpg}
\caption{Graph comparing uniform distribution against actual Observation against Benford's law for first leading digits with 2,348 numbers from Amazon, Apple and Cisco}
\end{figure}
\hfill \linebreak
\begin{figure}[h!]
\centering\includegraphics[width=0.7\linewidth]{Images/small1.jpg}
\caption{Graph comparing uniform distribution against actual Observation against Benford's law for first leading digits with 791 numbers from Amazon.}
\end{figure}
\hfill \linebreak
\begin{figure}[h!]
\centering\includegraphics[width=0.7\linewidth]{Images/small3.jpg}
\caption{Graph comparing uniform distribution against actual Observation against Benford's law for first leading digits with 178 numbers from Amazon's income statement }
\end{figure}
\hfill \linebreak
It can be seen here from these graphs that as the numbers in the data set decreases it stops following the Benford's law neither are close to uniform distribution. In fact the values are just random.'

\hfill \linebreak
\section{Analysis}
\label{s:4}
The Chi-Square test was performed on all the observed values and the Benford's probability on all the three leading digits. The Chi-Square test was also done for small data set. The absolute value of maximum and minimum difference observed between the data set against Benford's probability is also analyzed for all the observations. The null hypothesis and alternative hypothesis for each data set will be:
\begin{itemize}
\item H0: Observed data set and Benford's probability are related.
\item Ha: Observed data set and Benford's probability are not related.
\end{itemize}
The significance level will be 0.5 for all of the Chi-Square test.
\subsection{Chi-Square Test}
For first digit after the Chi-Square test the p value was 0.999987241, which means that the observed data almost followed the Benford's law. The p value is more than 0.5, thus I accepted the null hypothesis and observed data set and Benford's probability are related.
\hfill \linebreak
For second digit after the Chi-Square test the p value was 0.999999962, which means that the observed data almost followed the Benford's law. The p value is more than 0.5, thus I accepted the null hypothesis and observed data set and Benford's probability are related. In fact the observed data for leading second digit follows Benford's law better than the observed data for leading first digit.
\hfill \linebreak
For third digit after the Chi-Square test the p value was 0.999999924, which means that the observed data almost followed the Benford's law. The p value is more than 0.5, thus I accepted the null hypothesis and observed data set and Benford's probability are related. The observed data for leading third digit follows the Benford's law better than any other observations.
\hfill \linebreak
In small data set of 2348, 791 and 178 numbers, on the first digit after the Chi-Square test the p value was 0.999728129, 0.99628304 and 0.001370532 respectively. This means that the observed data for first two data set almost followed the Benford's law. The p value is more than 0.5, thus I accepted the null hypothesis for both data set and observed data sets and Benford's probability are related. Whereas in the last data set, the p value was less 0.5 and thus I have to reject the null hypothesis and observed data set and Benford's law probability are not related. 

\subsection{Absolute Error}
For the leading first digit, the absolute value of maximum error is 1.409 whereas minimum error is 0.059. The maximum error was in number 1 whereas minimum error was in number 9. For the leading second digit, the absolute value of maximum error is 0.671 whereas minimum error is 0.006. The maximum error was in number 1 whereas minimum error was in number 8. For the leading third digit, the absolute value of maximum error is 0.928 whereas minimum error is 0.074. The maximum error was in number 3 whereas minimum error was in number 8.
\hfill \linebreak
For leading first digit small data set with 2348 numbers, the absolute value of maximum error is 1.863 whereas minimum error is 0.077. The maximum error was in number 1 whereas minimum error was in number 7. For data set with 791 numbers, the absolute value of maximum error is 3.424 whereas minimum error is 0.0003. The maximum error was in number 1 whereas minimum error was in number 6. For data set with 178 numbers, the absolute value of maximum error is 10.525 whereas minimum error is 0.605. The maximum error was in number 4 whereas minimum error was in number 8.

\hfill \linebreak
\section{Conclusion}
\label{s:5}
There were couple of things that I found was something that was not expected. The p value for Chi square test kept on getting close to 1 as digit placed was increased from first to third digit. The peculiar thing I noticed was that from graph in Figure 14, there is a slight noticeable drop in frequency of number 3, but yet this data set had the best p value. If you compare the graph of Fig 5 with Fig 14, it seems that p value of leading first digit should have the best p value, however, this is not the case. All the data set somewhat followed the Benford's law except the smallest data set that only had income statement of Amazon. The maximum error value increased as the data set got smaller. There was no particular pattern in the maximum and minimum error values between the leading digits from first to three. 
\hfill \linebreak
The financial statements did follow the Benford's law, therefore I can conclude that the financial statements of these company are authentic. We know that each of company financial statements are authentic because if it was not, there would be significant change in error values. From analyzing the error value, we can find out which number (0,1,2...,9) has the highest error and we look in each file of company to find out which company has highest or lowest frequency of that  particular number. Thus the fraud can be detected.
\hfill\linebreak
In this research I observed that mostly number 1 has maximum error values whereas the minimum error values had different numbers in each observation. Lastly, I can also conclude that the more the number of data, more accurate will be the observation as we saw in this research that when I decreased the number of data, the p value started decreasing and maximum error value started increasing.



%% The Appendices part is started with the command \appendix;
%% appendix sections are then done as normal sections
%% \appendix

%% \section{}
%% \label{}

%% References
%%
%% Following citation commands can be used in the body text:
%% Usage of \cite is as follows:
%%   \cite{key}          ==>>  [#]
%%   \cite[chap. 2]{key} ==>>  [#, chap. 2]
%%   \citet{key}         ==>>  Author [#]

%% References with bibTeX database:

\section{Bibliography}
\label{s:6}
\bibliographystyle{model1-num-names}
\bibliography{sample.bib}

\begin{thebibliography}{00}

%% \bibitem must have the following form:
%%   \bibitem{key}...
%%

% \bibitem{}

\end{thebibliography}


\end{document}