Lecture 13: Distributions as models

(previous), (next)



Tom Apostol gives a nice brief introduction to the history of probabability in the second volume of his calculus textbook.

A gambler's dispute in 1654 led to the creation of a mathematical theory of probability by two famous French mathematicians, Blaise Pascal and Pierre de Fermat. Antoine Gombaud, Chevalier de Méré, a French nobleman with an interest in gaming and gambling questions, called Pascal's attention to an apparent contradiction concerning a popular dice game. The game consisted in throwing a pair of dice 24 times; the problem was to decide whether or not to bet even money on the occurrence of at least one "double six" during the 24 throws. A seemingly well-established gambling rule led de Méré to believe that betting on a double six in 24 throws would be profitable, but his own calculations indicated just the opposite.

One way we can address Mr. Gombaud's question today is to experiment. We run a trial, and then we know whether we'll win or lose. Suppose we try it -- we throw 2 dice 24 times and come up without double-6's, so we say to ourselves, "I'm going to lose my money if I pay to play this game", so we don't. But somebody else has done that same experiment, and gets a pair of double sixes, and so they think, "This is a game I can win, atleast most of the time. I'd be happy to pay fifty cents to play it." And when they actually play, they might win or they might lose. That one experiment didn't tell them (or us) what was going to happen very well -- this is a game of chance, and the outcome can be different each time it is played.

And so, we've reached a great philosophical bifurcation point -- how do we model and make sense of things in the world that do not follow simple constant laws -- things that may change for no easily observable reason. In simple societies, unusual events are often cause for invoking superstition and supernatural spirits and ancestors. In ancient greece Aesop recognized the goddess Tyke (Fortuna) in his fables as reason for outcomes that had no recognizable cause, suggesting that people could use religious offerings to improve their luck. This idea also resonates with islamic idea of occasionalism, promoted by Abu al-Ghazali in the 11th century. Occasionalism says that rather than interpretting predictable cause and effect as laws of got, all events like should really be seen like chance events as direct acts of God's. This suggests that laws of nature are actually acts of God and that the study of theology should replace the study of nature. In popular culture as in Thomas Hardy's 1878 novel "Return of the native", fate and irony are invoked as agents of luck. (see Roll the Bones: The History of Gambling for more history)

The birth of probability theory arose from an alternative perspective that was emerging in France and Italy in the 1500's and 1600's. While no individual trial of a gambling game can be deterministically understood, patterns could be found when large numbers of games where considered, either played by many different people, or played by the same people repeatedly! So, instead of considering just one trial, we should instead consider many trials. And instead of talking about whether a trial will be a win or a lose, we can then talk about how often a trial will be a win or a loss. This new language has allowed us to greatly expand and explore our description of the world.

Once people began describing the world with probabilities, whole new areas of description and experimentation openned up. You might find some of the following interesting...

Today, we can use our computers to quickly perform randomizing experiments involving many trials. To do this, though, we have to make an assumption about our die, and how it will roll. One idea (perhaps the simplest idea, but by no means the only idea) is that each time you roll the die, it is equally likely to come up with each of the six numbers on its sides, and that you can not use past rolls to predict anything about future rolls.

[Show code]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from scipy import rand

"""
rand is a procedure provided in many languages that returns a random floating
point number between 0 and 1, or a full array of such floating point numbers.
It can be used to create many kinds random numbers.  In this case, by
multiplying by 6, adding 1 and then truncating to an integer, we can simulate
the roll of a cubic die with 6 sides.  The value returned by rand() may be
zero, but is never 1, so we don't have to worry about the rare possibility of
actually getting 7 instead of 6.
"""

def dieroll():
    x = int(rand()*6)+1
    return x

def trial():
    numrolls = 24
    outcomes = [ (dieroll(), dieroll()) for i in range(numrolls) ]
    iswin = ((6,6) in outcomes)
    return iswin

numtrials = 100000
A = [ trial() for i in range(numtrials) ]
numwins = A.count(True)
fracwins = float(numwins) / float(numtrials)
print "Of %d trials, you won a fraction %f of them"%(numtrials,fracwins)

Of 100,000 trials, you might a fraction 0.491040 of them. That would suggest that you shouldn't pay more than about 60 cents to play the game if you want to come out a winner. But this number comes up different every time we calculate it. If we run things 5 more times, we get

Of 100000 trials, you won a fraction 0.493010 of them
Of 100000 trials, you won a fraction 0.492310 of them
Of 100000 trials, you won a fraction 0.489920 of them
Of 100000 trials, you won a fraction 0.490850 of them
Of 100000 trials, you won a fraction 0.491080 of them

The numbers are only a little different, but they are still different. There is a fundamental limitation on all Montecarlo methods like the one above -- they converge to an answer at a rate proportional to the square root of the number of trials. Thus, to get an answer that is twice as accurate, we have to quadruple the number of trials. The gambler of the 1600's did not realize this precisely, but they new it was hard to tell exactly how fair a game was by trial and error, and hoped that some mathematical calculations would be a better way. Pascal, Fermat, and Laplace ultimately obliged them with the development of probability theory, and the concept of independence.

If we assume that all outcomes are equally likely (equipartion postulate), then there are \(36\) possible outcomes every time we roll a pair of dice. Of these, there are \(35\) that are losers. The chance of losing each role is then \(35/36\). The chance of losing all the roles is \((35/36)^{24}\), so the chance of winning atleast one is \[1 - \left(\frac{35}{36}\right)^{24} = 0.4914...\] If you pay less than 49 cents each time you play, then on average over a long time, you are very likely to be making more money than you'll be losing. This calculated value is centered among our simulation estimates, and thus, passes the smell-test.

This is not a course in probability theory -- I will assume the basic idea of a probability is familiar to you, and that common-sense and reasoning will be able to lead us through any brambles we may stumble across. What we will address now is how we can use combinatorics and probability to model various possible outcomes of processes where chance is prominent.

Measures - Geometry for Probability

Like in geometry, we like to use probability theory by find formula's for the shapes over various events. But in this case, shape is a probablity measure (a.k.a. density, distribution) for how often the different events under consideration can be expected to occur. For the Gombaud-Pascal problem, double-sixes may not appear, may appear once, tiwice, maybe as many as 24 times. But we will probably never observe 24 double-sixes in a row, although we will often observe no double-sixes. Here is a simulation that shows how often the various outcomes have occurred in experiment and compares them with the predicted frequencies from the binomial distribution.

[Show code]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from scipy import rand, array
from scipy.stats import binom
import matplotlib.pyplot as mpl

"""
Simulate a number of independent Bernoulli trials
and bin them to generate a histogram that should
be closely approximated by the Binomial distribution.
"""

numdraws = 24
p = 1./36
def run():
    outcome = [ (p > rand()) for i in range(numdraws) ]
    return outcome.count(True)

numreps = int(1e6)

A = [ run() for i in range(numreps) ]
binbnds = array(range(numdraws+2))-0.5
n, bins, patches = mpl.hist(A, binbnds, normed=1, facecolor='green', alpha=0.5)
x = binbnds + 0.5
y = [binom.pmf(i, numdraws, p) for i in x]
error = sum(map(abs,array(n) - y[0:-1]))
print n
print y
mpl.plot(x, y,'bo-')
mpl.legend(['Binomial Prediction','Observed Histogram'],1)
mpl.xlim(-0.5,numdraws+0.5)
mpl.title('Experimental test of the Binomial measure', fontsize=20)
mpl.xlabel('Number of successes', fontsize=18)
mpl.ylabel('Frequency of occurence', fontsize=18)
mpl.text(12, .23, "Probability $p=1/36$"%p, fontsize=18)
mpl.text(12, .2, "Draws $n=%d$"%numdraws, fontsize=18)
mpl.text(12, .17, "%d Replications"%numreps, fontsize=18)
mpl.text(12, .14, "%.1f %% error"%(100 * error), fontsize=18)
mpl.savefig('binomial_experiment.png')
mpl.savefig('binomial_experiment.pdf')

Here is a list of a few common and important discrete probability distributions. For reference, given a random variable \(x\) that takes the value of a natural number, then the probability generating function for \(x\) is \[\hat{x}(s) = E[s^n] = \sum _ {n=0}^{\infty} p _ n s^n\] while the cumulant generating function \[\tilde{x}(t) = \ln E[e^{t}] = \ln \hat{x}(e^{t}).\]

Bernoulli measure

A random variable with takes the value 1 (success) with probablity p and 0 (failure) with probability 1-p.

\[\hat{x}(t) = 1-p + p s\] \[\tilde{x}(t) = \ln( 1- p + p e^t)\]

Binomial measure

What is the probability of having k successes among \(n\) trials when each trial is independent and identical? The answer is the binomial probability distribution, with \(p\) being the probability of an individual success. The binomial probability generating function \[\hat{x}(s) = \sum_{k=0}^{n} \Pr(X = k) s^k = \sum_{k=0}^{n} {}_nC_k \, p^k(1-p)^{n-k} s^k.\]

Geometric measure

What is the probability that you can have k successes before your first failure? The answer is a geometric measure with success probability p. \[\hat{x}(s) = \sum _ {k=0}^{\infty} (1-p) p^k s^k\] \[\tilde{x}(t) = n \ln( p + (1-p) e^t)\]

Pascal measure

What is the probability of having k successes before r failures, when each trial is independent and identical? The answer is a Pascal probability measure, (a special case of the negative binomial probability measure) \[\sum _ {k=0}^{\infty} {} _ {k+r-1}C _ {k} p^k(1-p)^r\] \[\tilde{x}(t) = n\ln((1-p)/(1-pe^{t}))\]

The Pascal measure also appears if events are Poisson distributioned, but the mean of the Poisson in a heterogeneous population is Gamma-distributed.

Power law measures

Another family of commonly used measures are power-laws. These are important particularly because the way a measures's tail thins out can have important consequences in many theories.

\[\frac{1}{\zeta(n)} \sum _ {x=1}^{\infty} \frac{1}{x^n} s^n\] where \(\zeta(n)\) is the Riemann zeta-function.