Topics
Further reading
The Poisson distribution does a good job of describing some observations. But before we can make such a comparison, we have to know what the parameter \(\lambda\)'s true value is. How can we compare our prediction to our data if we don't know \(\lambda\)?
Well, one answer is to just try many values of \(\lambda\) and look for the one that does the best job of fitting the data. Let's look at a classical example.
Here is a classic example of something that seems totally unrelated by can be explained by the Poisson distribution. Notice that in this case, the poisson distribution only fails to explain about 4 percent of the distribution's mass when the rate parameter is picked optimally. That's pretty good, given the "totally random" nature of the process.
In 1898, Russian economist Ladislaus Bortkiewicz published his first statistics book entitled Das Gesetz der keinem Zahlen. In it he investigated the annual deaths by horse kicks in the Prussian Army from 18751984, across 14 different army corps including the "Guard Corps". Have a look at the data:
[ Data : hide , shown as table , shown as CSV shown as QR ]
# number of deaths, number of occurrences
# Annual deaths from horse kicks in the Prussian army (18751894)
# https://books.google.com/books?id=dh24EaSrmBkC&lpg=PA92&dq=prussian%20soldier%20horse%20deaths%20poisson&pg=PA92#v=onepage&q&f=false
# Originally from von Bortkiewicz's 1989 book, The Law of Small Numbers.
# Bortkiewicz rediscovered the Poisson distribution on his own
0,144
1,91
2,32
3,11
4,2
5,0
6,0
7,0
What we're interested in, is the distribution on the number of deaths per corps per year. For that we have 280 observations (14 corps over 20 years). We can "bin" the data (how many 0 deaths? how many 1 death? etc.) by hand or quickly using python:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 

Now we want to fit this to a Poisson distribution, \(\Pr(N=k)=\dfrac{\lambda^ke^{\lambda}}{k!}\). So what's \(\lambda\)? The brute force approach is to try a bunch of \(\lambda\) values and then check the error relative to the frequency of measurements in the data, say with the 1norm \(E_1=\sum_jy^{data}_jy^{pred}_j\) where \(j=0,1,2,3,4\).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 

Statisticians will usually say that a better way to determine \(\lambda\) is by maximumlikelihood: for what value of \(\lambda\) are we mostlikely to observe this data?
\[\mathcal{L}(\vec{x}\lambda) = \prod_{n=0}^{\infty} \left( \frac{\lambda^n e^{\lambda}}{n!} \right)^{x_n}\] \[\ln \mathcal{L} (\vec{x}\lambda) = \sum _ {n=0}^{\infty} x _ n \left( n \ln \lambda  \ln n!  \lambda \right)\]
We could minimize this numerically, without to much effort, but in this problem, we can make more analytic progress. Minimizing by differentiating in \(\lambda\) and setting equal to zero, we find \[\lambda _ {MLE} = \frac{\sum _ n n x _ n}{\sum _ n x _ n} = 0.7\] Note that this is pretty close to the \(0.665\) obtained by minimizing the onenorm but not exactly the same.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 

Either way you do the fitting, the Poisson distribution does a remarkably good job explaining the Prussian army horse kick data. Yet death by horsekick is a rare thing, and it is hard to imagine a single common circumstance accounting for all of these deaths. If there is a particular circumstance related to these deaths, the value of the parameter \(\lambda\) has a direct mechanistic interpretation and the Poisson distribution can tell us something more about the circumstances of the deaths. But this is probably not the case.
The explanation of the data in terms of a Poisson distribution is probably a consequence of the stability of Poisson distribution under the superposition of many independent and evenmorerare circumstances. One of the special properties of the Poisson distribution is that it is additive  if you take two or more circumstances welldescribed by the Poisson distribution and add them together, the new distribution is also a Poisson distribution. The convergence of the binomial distribution to the Poisson distribution is sometimes called "The law of rare events" (Bortkiewicz, 1898), and holds more generally than just for the binomial distribution. The hypergeometric distribution of draws from an urn is also approximately a Poisson distribution when the number of stones is large and the number of draws is small (see exercises). If you shuffle a large shoe of cards well, the number of cards that end up at the same point they started is approximately Poisson distributed. If each event is independent and has its own probability, and those probabilities are all small, then the frequency of the events in a population will converge to the Poisson as the population gets infinitely large. And it turns out that the original circumstances don't even have to be described by the Poisson distribution for the superposition to be a Poisson distribution. Khinchin shows that as independent renewal processes with nice interevent distributions are superimposed, the distribution of events in a finite window converges to a Poisson process (Mathematical Methods in the Theory of Queuing, 1960 in translation).
An example, which seems like it should also be a Poisson distribution, was collected for the rates of accidents among machinists during World War I.
[ Data : hide , shown as table , shown as CSV shown as QR ]
# number of accidents, number of machinists with that many accidents
#
# The number # of accidents over 3 months among 414 machinists during World War
# One. The data on point comes from a 1920 study of Greenwood & Yule.
# http://www.jstor.org/stable/2341080
#
0,296
1,74
2,26
3,8
4,4
5,4
6,1
7,0
8,1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 

[ Data : hide , shown as table , shown as CSV shown as QR ]
#feet,inches,total inches,count
#
# @article{bib:Blakeslee1914,
# author = {Blakeslee, Albert F.},
# title = {{CORN AND MEN}: The Interacting Influence of Heredity and EnvironmentMovements
# for Betterment of Men, or Corn, or Any Other Living Thing
# Onesided Unless They Take Both Factors into Account},
# journal = {Journal of Heredity},
# volume = {5},
# number = {11},
# pages = {511},
# year = {1914},
# doi = {10.1093/oxfordjournals.jhered.a107785},
# URL = {http://dx.doi.org/10.1093/oxfordjournals.jhered.a107785},
# }
#
# This is the data from a famous photograph
# visually showing the distribution of male
# height in at the Connecticut State
# Agricultural College
#
# http://www.sbs.utexas.edu/levin/bio213/genetics/height.dist.gif
#
4,09,57,0
4,10,58,1
4,11,59,0
5,00,60,0
5,01,61,1
5,02,62,5
5,03,63,7
5,04,64,7
5,05,65,22
5,06,66,25
5,07,67,26
5,08,68,27
5,09,69,17
5,10,70,11
5,11,71,17
6,00,72,4
6,01,73,4
6,02,74,1
6,03,75,0
Students at the Connecticut Agricultural research station in 1913 were binned according to their inches of height and asked to stand in order on their sports field. The resulting picture has a strong resemblance to a Normal distribution. However, fitting a Normal distribution to data is a little more complicated than fitting a Poisson distribution  the Normal has two parameters that must be determined  the mean \(\mu\) and the standard deviation \(\sigma\). Again, we can measure the goodness of fit in several different ways, with different merits. If we use likelihood, things are a little different for continuous distributions since the probability of each individual number is infinitessimally small. But if we use the loglikelihood, this infinitesimal part just becomes an additive constant and has no effect on the parameter inference.
\[\ln \mathcal{L}(x  \mu, \sigma) \propto C  n \ln \sigma \frac{1}{2\sigma^2} \sum _ {i=1}^n (x _ i  \mu)^2\]
Using a standard minimization algorithm, we can find the the parameters for which the Normal curve fits the data best. But we don't have to search for the best fit numerically. There are quick formulas to find the parameter values where the data are most likely. If we differentiate the likelihood with respect to the mean and standard deviation of the normal distribution and the solve for the values where the derivative is zero (maximizing the likelihood), we find the maximum likelihood estimates (MLE) \[\mu _ {MLE} = \frac{\sum _ i x _ i}{n}\] and \[\sigma _ {MLE} = \sqrt{ \frac{1}{n} \sum _ i (x _ i  \mu _ {MLE})^2}.\] These are standard results from ANOVA analysis in statistics, and built into most statistical software tools like R and SAS.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 

We can see from the images that the Normal does a good job of representing the data. The fit CDF goes right through the middle of the empirical CDF, and the probability plot is right over the diagonal line.
Another example is a set of measurements of paperthickness performed by students.
[ Data : hide , shown as table , shown as CSV shown as QR ]
# student number, measurement 1, measurement 2, measurement 3, measurement 4
#
# measurements of paper thickness, in millimeters
# Each student made 4 measurements
#
# Data from Youden, Experimentation and Measurement, 1962
# reprinted September, 1991
#
01, .0757, .0762, .0769, .0746
02, .0808, .0793, .0781, .0821
03, .0811, .0772, .0770, .0756
04, .0655, .0683, .0714, .0746
05, .0741, .0710, .0748, .0711
06, .0756, .0772, .0776, .0759
07, .0775, .0785, .0760, .0761
08, .0747, .0765, .0735, .0776
09, .0719, .0762, .0802, .0713
10, .0734, .0833, .0833, .0783
11, .0755, .0740, .0714, .0743
12, .0788, .0817, .0794, .0766
13, .0731, .0716, .0726, .0714
14, .0833, .0794, .0783, .0788
15, .0767, .0775, .0765, .0793
16, .0787, .0798, .0864, .0817
17, .0784, .0799, .0789, .0802
18, .0784, .0820, .0796, .0818
19, .0830, .0796, .0778, .0767
20, .0741, .0680, .0733, .0723
21, .0759, .0766, .0772, .0466
22, .0810, .0812, .0789, .0776
23, .0777, .0759, .0795, .0790
24, .0784, .0786, .0797, .0859
This data is not binned by default. We could bin it or use a kernel estimator, but it can be analyzed using the Normal's cumulative distribution function in its current form. The normal distribution does a moderately good job fitting the data, but there is some systematic error, as revealed by the probability plot.
In a 1page paper in 1946, Clarke described the number of bomb strikes in central London during the World War II. Fit these data with a Poisson distribution, estimating the intensity \(\lambda\).
count  # sites with this many bombs 

0 

1 

2 

3 

4 

5+ 

Is the number of hurricanes reaching Florida each year between 1850 and 2010 Poissondistributed?
In 1975 observations, a subset of Penn State's female students were binned according to their height and asked to stand in order in front of Old Main (see below).
Fit these data with a Normal distribution.
[ Data : hide , shown as table , shown as CSV shown as QR ]
# Data from Figure 1 of Joiner's Living Histograms (1975)
# histogram of women sorted by height
# height (in), number
59,2
60,5
61,7
62,10
63,16
64,23
65,20
66,16
67,9
68,6
69,6
70,3
71,1
72,1
A common problem in rock and mineral mining is estimating the distribution of stone sizes during crushing. Empirical data is collected by dropping crushed stone through mesh sieves of successively smaller hole size, and measuring the fraction that does not fall through each. These numbers can be interpreted as the complementary cumulative distribution function (\(1 F _ {CDF}(x)\)). In 1933, Rosin and Rammler proposed a model for the distribution of particle sizes of ground coal based on a twoparameter Weibull distribution (a member of the FisherTippettGumbel family), which has cumulative density function \[F _ {CDF}(x) = 1  e^{(x/\lambda)^n}.\] A very convenient feature of the Weibull distribution is that its CDF becomes a line under the transformation \(\ln\left(  \ln(1F _ {CDF}(x))\right)\).
Use the transformation above and linear least square to estimate \(\lambda\) and \(n\) for the this data on milled cork particle sizes.
Plot the bestfit complementary CDF and the sieve data on top of each other. How good a job did the Weibull do?
[ Data : hide , shown as table , shown as CSV shown as QR ]
#Min size (mm), max size (mm), Mesh size (mm), Fraction (g), Fraction (%), Cumulative % weight (under), Cumulative % weight (over)
# doi:10.1016/j.matchar.2004.04.007
# table 1, p 161
# PSD analysis of milled cork
0,0.71,0.71,1.00,1.39,1.39,98.61
0.71,1,1.0,1.70,2.36,3.75,96.25
1,1.4,1.4,8.60,11.94,15.69,84.31
1.4,2,2.0,7.20,10.00,25.69,74.31
2,2.8,2.8,17.70,24.58,50.27,49.73
2.8,4,4.0,20.50,28.47,78.74,21.26
4,5.6,5.6,11.40,15.83,94.57,5.43
5.6,8,8.0,2.30,3.19,97.76,2.24
8,11.2,11.2,1.60,2.22,100,0
Below is a data set for maximum rainfalls in Brussels, Belgium over a 50 year period. Two possible models of this data are the Normal distribution (a.k.a. the Normal distribution) and Gumbel's distribution. Gumbel's distribution seems the better choice, apriori, because of its frequent use in risk analysis to describe extreme natural events, but we'd like to test this hypothesis ourselves.
Plot the time series.
Plot the empirical cumulative distribution function (CDF) for the maximum daily rainfall.
Find the Normal distribution under which the data are most likely.
Using scipy's fmin function, find Gumbel's distribution under which the data are most likely.
Draw the empirical CDF, togethor with the bestfit Normal CDF and the bestfit Gumbel CDF together on a single plot. Using this plot, discuss the quality of the distribution fits.
[ Data : hide , shown as table , shown as CSV shown as QR ]
# year, maximum rainfall in one day (mm)
#
# Annual maximum daily rainfall for Uccle (Brussels, Belgium) for
# the 50year period 19341983 measured in millimeters.
#
# @book{demaree1985intensity,
# title={Intensitydurationfrequency Relationship of Point Precipitation at Uccle: Reference Period 19341983},
# author={Demar{\'e}e, Gaston R},
# year={1985},
# publisher={Institut Royal M{\'e}t{\'e}orologique de Belgique}
# url={https://books.google.com/books/about/Intensity_duration_frequency_Relationshi.html?id=5dectgAACAAJ}
# }
1934,23.9
1935,37.6
1936,23.8
1937,29.2
1938,33.9
1939,37.7
1940,60.1
1941,30.9
1942,72.1
1943,51.4
1944,24.2
1945,46.9
1946,26.6
1947,29.3
1948,29.5
1949,19.8
1950,39.0
1951,43.0
1952,51.1
1953,39.2
1954,31.7
1955,23.0
1956,34.6
1957,37.7
1958,42.9
1959,30.8
1960,48.0
1961,32.2
1962,58.6
1963,78.3
1964,43.0
1965,48.4
1966,39.3
1967,23.9
1968,26.0
1969,52.9
1970,29.2
1971,45.7
1972,28.6
1973,30.0
1974,30.7
1975,25.8
1976,33.0
1977,26.3
1978,49.2
1979,33.6
1980,38.2
1981,56.8
1982,50.1
1983,24.2
The Boeing Airplane Company makes large use of aluminum in airplane construction, so it was important to them to know the failure distribution aluminum parts. As an experiment, they subjected 101 aluminum strips to periodic loading of up to 21,000 pounds per square inch 18 times per second. The number (in thousands of cycles) until failure are given below.
Find the Normal distribution under which the data are most likely.
Using scipy's fmin function, find Gumbel's distribution under which the data are most likely.
[ Data : hide , shown as table , shown as CSV shown as QR ]
# Thousands of cycles until rupture of rectangular strips of 6061T6 aluminum
# sheeting, subjected to periodic loading with maximum stress of 21,000 psi
#
# Birnbaum, Z. W. and Saunders, S. C., A Statistical Model for LifeLength of
# Materials, Journal of the American Statistical Association, 53(281) p 151160. 1958.
# http://www.itl.nist.gov/div898/handbook/eda/section4/eda4291.htm
#
370
706
716
746
785
797
844
855
858
886
886
930
960
988
990
1000
1010
1016
1018
1020
1055
1085
1102
1102
1108
1115
1120
1134
1140
1199
1200
1200
1203
1222
1235
1238
1252
1258
1262
1269
1270
1290
1293
1300
1310
1313
1315
1330
1355
1390
1416
1419
1420
1420
1450
1452
1475
1478
1481
1485
1502
1505
1513
1522
1522
1530
1540
1560
1567
1578
1594
1602
1604
1608
1630
1642
1674
1730
1750
1750
1763
1768
1781
1782
1792
1820
1868
1881
1890
1893
1895
1910
1923
1940
1945
2023
2100
2130
2215
2268
2440
There are more data sets like these that you can explore, out on the internet, like at the journal of statistical education and UFL and github.