Simulations

( previous, home, next )


Outline:

Other:

Further reading


ancient roman dice
ancient roman dice

Simulating events involving chance

People have long struggled to make sense of unexpected events. In some communities, unusual events are a cause for invoking superstition and supernatural spirits and ancestors. The ancient Greek story teller Aesop recognized the goddess Tyke (Fortuna) in his fables as the reason for outcomes that had no recognizable cause and suggested that people could use religious offerings to improve their luck. The 1663 and 1755 New England earthquakes were widely interpreted as signs of divine displeasure and inspired brief religious revivals. This idea also resonates with Islamic idea of occasionalism, promoted by Abu al-Ghazali in the 11th century. Occasionalism says that rather than interpreting predictable cause and effect as laws of got, all events like should really be seen like chance events as direct acts of God’s. This suggests that laws of nature are actually acts of God and that the study of theology should replace the study of nature. In Thomas Hardy’s 1878 novel “Return of the native”, fate and irony are invoked as agents of luck. (see Roll the Bones: The History of Gambling for more history)

Gombaud’s bet

In the first book on the analysis of chance events (1657), Dutch scientist Christian Huygens tells how a gambling argument lead Blaise Pascal and Pierre de Fermat develop the description of chance events as probabilities. As the story goes, Antoine Gombaud, Chevalier de Méré, a French nobleman, had discovered a puzzle in a dice game. The game was to throw a pair of dice several times and to try and get a double-six. Experienced gamblers believed that to win more often than not, one needed at least 25 throws. But Gombaud had calculated that one would make a profit eventually even with only 24 rolls. His reasoning was based on an ancient rule-of-thumb that the critical number of rolls needed to break even was inversely proportional to the chance of success. When the chance of success was 1 in 6, a player needed 4 rolls to win more often than not, so if the chance of success was 1 in 36 (6 times smaller), the number of rolls needed was \(6 \times 4 = 24\). Gombaud wrote to Pascal to see if he could explain the disagreement.

Imagine new gamblers’ experiences with Gombaud’s bet. If we throw 2 dice 24 times and come up without double-6’s, we say to ourselves, “I’m going to lose my money if I pay to play this game”, so we don’t. But somebody else has done that same experiment, and gets a pair of double sixes, and so they think, “This is a game I can win, at least more often than not. I’d be happy to pay fifty cents for a chance to win a dollar.” And when they actually play, they might win or they might lose. That one experiment didn’t tell them (or us) what was going to happen very well – this is a game of chance, and the outcome can be different each time it is played. And so, we’ve reached a philosophical conundrum – how do we make sense of things in the world that can give different outputs each time, even when the inputs are the same.

One way of making sense of the unpredictable is to imagine repeating the game not just once or twice, but many many times. If we keep repeating the this process over and over, we no longer care about the outcome of any one game. Rather, what fraction of the time do we expect to win? Antoine had initially calculated that a double-6 would occur about once every 36 throws, so if he threw the pairs of dice 24 times, he should win about 24/36 = 2/3rds \(\approx 0.66\) of the time. Yet, gamblers using the rule always seemed to lose eventually.

We can easily repeat Gombaud’s experience with a pair of dice in hand. With a little more work, we can replicate the bet much more quickly than we can do by hand using a computer program. The key feature of this program is a function that imitates the outcome of rolling a die.

[Show code]
from scipy import rand

"""
rand is a procedure provided in many languages that returns a random floating
point number between 0 and 1, or a full array of such floating point numbers.
It can be used to create many kinds random numbers.  In this case, by
multiplying by 6, adding 1 and then truncating to an integer, we can simulate
the roll of a cubic die with 6 sides.  The value returned by rand() may be
zero, but is never 1, so we don't have to worry about the rare possibility of
actually getting 7 instead of 6.
"""

def dieroll():
    return int(rand()*6)+1

def pairroll():
    return (dieroll(), dieroll())

def iswin(state):
    if (6,6) in state:
        return True
    else:
        return False

def game():
    numrollsperrun = 24
    state = list()
    for i in range(numrollsperrun):
        state.append( pairroll() )
    return state

for b in range(0,7,2):
    numgames = 10**b
    numwins = 0
    for i in range(numgames):
        if iswin(game()):
            numwins += 1
    fracwins = float(numwins) / float(numgames)
    print("Of %d bets, you won a fraction %f of them"%(numgames,fracwins))

This script is called a “simulation”. A simulation is an algorithm that we use to explore the possible outcomes of a series of events. Each of our attempts to go through the simulation events from beginning to end is called a “run”. Many simulations are done with computer programs, but simulations have also often done with pencil, paper, and dice (GURPS, D&D, board games, table-top military exercises…). H. G. Wells, for example, published a rule-book for simulating combat with toy soldiers in 1913. Electric computers offer the advantage of quickly performing many simulation runs using complicated rules without error.

If we execute the script, we get output like this (though not EXACTLY this).

Of         1 bets, you won a fraction 0.00000 of them
Of       100 bets, you won a fraction 0.55000 of them
Of     10000 bets, you won a fraction 0.48870 of them
Of   1000000 bets, you won a fraction 0.49190 of them

Despite its simplicity, our simulation of Gombaud’s bet contains the three core ingredients of most simulations: a set of variables that record the “state” of the game, rules for the state “transitions” – how the state changes from one event to the next, and an analysis routine that extracts the information we care about from the simulation’s state. In this simulation, the state of the game is represented by a list of the outcomes of all the dice rolls performed so far. It starts off as an empty list. The transition rule is to “roll” from the current state is to “roll” a pair of dice, and to update the state by appending the outcome of the current roll. This is a particularly simple transition rule because it does not depend at-all on the current state; in the language of probability, we say the update is “independent” of the current state. We update the state 24 times, and then we stop. The final testing routine checks to see if the player wins by checking if any of the rolls in the final state was a double-6.

One core element of this script is it’s call to rand(). rand() is part of what we call a “pseudo-random” number generator. It returns a different floating point number between 0 and 1 each time it is called. In this case, the results of rand are converted into integers 1 through 6 in equal amount to simulate dice rolling. These numbers are not random – they are calculated using a formula. However, the pattern of numbers generated by this formula is so complicated that we (hopefully) have a very hard time telling the difference between them and other “random” numbers. Thus, we say that rand() produces “pseudo-random” numbers.

There are a number of ways we could make this simulation execute more efficiently, so that it takes less time and energy to perform a million runs. For example, instead of simulating all 24 rolls each time, we could test as we go and stop immediately if we get a double-6. These kinds of optimizations can be very valuable, but we will not address them directly here.

Our simulation converges slowly, but the chance of winning in 24 rolls seems to be nearly \(0.491\). While our estimate is uncertain, it’s clear that the chances of winning are smaller than the 2/3rds that Gombaud was expecting. The expert gamblers seem to have been right – the dice-thrower is expected to lose just a little less often than they win over the long-term.

Thanks to Fermat, Pascal, Huygens, and their successors, we know a more efficient way to use the laws of probability and the “principle of indifference” to estimate the chance of rolling double sixes atleast once in 24 rolls of a pair of dice. If we assume that all outcomes are equally likely (the principle of indifference), then there are 36 possible outcomes every time we roll a pair of dice. Of these, there are 35 that are losers. The chance of losing each roll is then 35/36. The chance of losing all the rolls is (35/36)\(^{24}\), so the chance of winning at least one is \[1 - \left(\frac{35}{36}\right)^{24} = 0.491403.\] So our simulation runs and our direct probability calculations give the same answer to 4 decimal places (if we round).

But keep in mind that neither our simulation experiment nor the direct calculation is the same as testing Gombaud’s problem in the physical world. There are many complications that we did not account for in writing our simulation (are the dice imperfectly weighted, are the dice thrown well, …) that could effect the outcomes, and that are different between now and the 1600’s. Pragmatically, it is impossible to assign a precise number to the chances of winning Gombaud’s bet.

Chocolate quality control

Here is another example of simulation modelling. Imagine you have recently come into possession of a truck load of cacao beans useful for making chocolate (\(4\times 4 \times 8 = 128\) crates). You’d like to export them to the US, but there is a catch. Most of the crates come from plantations that grow high-grade cacao varieties, but about 25 of the crates are CCN-51 cacao, a miserable variety used to make most of the chocolate in the United States (see here).

When your truck goes to market, you know that the testers will not check every crate. Instead, they will pick 5 crates at random (each different), and pay you depending on quality of those crates. Since 25 crates is about a fifth of 128, we might expect 1 of the five tested to be found to be CCN-51, but you might get lucky and they miss all of them, or you might get unlucky. How often, in a truck load of this size, do the testers find none, one, two, three, four, or five crates of low-quality chocolate? Well, it depends on how they test and maybe you can game the tests, but if the testing is totally random, we can write a simple python program to simulate the process and see what happens.

[Show code]
from numpy.random import shuffle

N = 4*4*8 # total number of crates
K = 25 # number of crates with CCN-51 chocolate
n = 5 # number of crates tested

num_checks = 100000

truck = [ 0 ]*(N-K) + [ 1 ]*K
counts = [ 0 ]*(n+1)
for t in range(num_checks):
    # we shuffle the truck crates, and then check the first n
    shuffle(truck)
    counts[sum(truck[0:n])] += 1

frequency = [ x/float(num_checks) for x in counts ]
print("# count, frequency")
print("# %d checks"%num_checks)
for x in range(n+1):
    print("%d, %f"%(x, frequency[x]))

Your results are likely to be similar to what follows …

# count, frequency
# 100000 checks
0 0.33068
1 0.41902
2 0.2011
3 0.04398
4 0.005
5 0.00022

Of course, if the selection of crates is not random, but somewhat known ahead of time, you could arrange the crates on the truck to your benefit and do better than this. The numbers above are a worst-case scenario.

The Siege of Australia

World map in the boargame Risk
World map in the boargame Risk

If you have played the boardgame “Risk” a few times, you’ve probably noticed that Australia plays an unusually important strategic role – the games often pivot in the end on defence of the Australian archipelago, and the strait of Malacca in particular. If you can break through across the straight into Indonesia, you can often take Australia, but defenders can mass forces in Indonesia, making a breakthrough very difficult. The question then becomes, how many armies do you need to break through and concur the subcontinent.

Question: Given your and your opponent’s counts of armies in Siam and Indonesia respectively, how frequently do you break through the defensive line?

To answer this, lets perform some simulations of the dice-rolls that might play out, given the rules of the game and knowing how many armies are on each side.

[Show code]
from scipy import rand


def dieroll():
    r = int(rand()*6)+1
    assert r > 0 and r < 7
    return r

def randbattle(num_mine, num_yours):
    d = dieroll
    while True:
        if num_mine == 1 or num_yours == 0:
            return num_mine, num_yours
        assert num_mine > 0
        (x, y) = min(3, num_mine-1), min(2, num_yours)
        me, you = [ d() for i in range(x) ], [d() for i in range(y)]
        me.sort()
        you.sort()
        while len(me) > 0 and len(you) > 0:
            if me.pop() > you.pop():
                num_yours -= 1
            else:
                num_mine -= 1

def main():
    numtrials = 10**4
    num_mine = int(input("How many armies do you have to attack? "))
    num_yours = int(input("How many armies are defending? "))
    outcomes = {}
    for x in range(-num_yours, num_mine+1):
        outcomes[x] = 0
    numwins = 0
    for t in range(numtrials):
        result = randbattle(num_mine, num_yours)
        assert result[0] == 1 or result[1] == 0
        result = result[0] - result[1]
        outcomes[result] += 1
        if result > 1:
            numwins += 1

    print("I have run %d battles."%numtrials)
    print("Of those trials, you won %f percent of them."%(100*float(numwins)/numtrials))
    # c = 0.
    # for x in range(num_mine, -num_yours-1, -1):
    #     f = outcomes[x]/float(numtrials)
    #     c += f
    #     print "%d,%.8f,%.4f"%(x, f, c)


main()

It would be helpful to know how the frequency of winning changes as the initial number of armies on each side changes. This can be displayed with an image plot, after some recoding.

[Show code]
from scipy import rand, zeros, flipud
from matplotlib.pyplot import *


def dieroll():
    r = int(rand()*6)+1
    assert r > 0 and r < 7
    return r

def battle(num_mine, num_yours):
    d = dieroll
    while True:
        if num_mine == 1 or num_yours == 0:
            return num_mine, num_yours
        assert num_mine > 0
        (x, y) = min(3, num_mine-1), min(2, num_yours)
        me, you = [ d() for i in range(x) ], [d() for i in range(y)]
        me.sort()
        you.sort()
        while len(me) > 0 and len(you) > 0:
            if me.pop() > you.pop():
                num_yours -= 1
            else:
                num_mine -= 1

def winchance(num_mine, num_yours, numtrials = 10**4):
    numwins = 0
    for t in range(numtrials):
        result = battle(num_mine, num_yours)
        assert result[0] == 1 or result[1] == 0
        result = result[0] - result[1]
        if result > 1:
            numwins += 1

    return numwins/float(numtrials)

def main():
    max_num_armies = 30
    z = zeros( (max_num_armies, max_num_armies) )
    for x in range(1,z.shape[0]):
        for y in range(1,z.shape[1]):
            z[x,y] = winchance(x,y, numtrials=3*10**2)

    z = flipud(z[1:,1:].T)
    rect = [1, max_num_armies, 1, max_num_armies]

    imshow(z, extent=rect, aspect='auto', interpolation='none')
    title("Probability of a Breakthrough",fontsize=18)
    xlabel("Initial number of attacking armies",fontsize=18)
    ylabel("Initial number of defending armies",fontsize=18)
    colorbar()
    savefig('risk_siege_surf.png')

main()

A Baseball Inning

Now, let’s consider something a little more complex. Can we make a simulation for predicting the outcomes of a baseball game? The sport of baseball is particularly convenient simulation because the games progress in well-defined discrete time-steps – pitch by pitch. Other popular sports like football and basketball have some discrete time steps (downs and possessions) but also a fair amount of continuous game play, while other sports like soccer are almost entirely continuous in time.

To see how we build the model, let’s imagine we are totally naive of the rules of baseball, and learning as we watch our first baseball game. At the most basic level, baseball is a game between pitcher throwning a ball and batters trying to hit that ball. The progression of the game depends on the pitches – each pitch moves the game forward.

Let’s suppose the game proceeds as follows. The pitcher throws the first pitch. The batter swings and misses. The pitcher throws another pitch. The batter swings and misses. The pitcher throws another pitch. The batter swings and misses. Then the batter sits down and a new batter comes up. Why? Nothing different happened between the first, second, and third pitches - each time, the events seemed to be the same. But everybody agreed that it was time for the first batter to sit down. Of course, reason is that the rules of the game is that you get 3 tries to hit the ball, and after 3 tries, somebody else gets a turn.

In order to represent this, we introduce the auxillary concept of strikes. The count of strikes helps us represent the state of the game right now without having to repeat all the events leading to now. A player gets 3 swings at the ball – once they’ve had their 3 swings, if they haven’t hit the ball, they’re done.

But what if the batter doesn’t swing at the pitch? Well, then it depends on whether the pitch was a good pitch or not. If it was a good pitch, then batter should have swung, even though they didn’t – we’ll count that as a strike also. But if it was a bad pitch, that’s the pitcher’s fault, and we also want to penalize the pitcher for throwing bad pitches. They call bad pitches balls, and we count those also. If a pitcher throws four balls, then the player gets to walk to first base, even though they didn’t hit anything, because the pitcher isn’t giving him a good chance.

The pair of balls and strikes is called the “count” in baseball, and represents the batter’s state as time progresses. The count is not a physical/tangible thing of the process – it is a abstract concept we create as a convenient way of keeping track of the history of the at-bat and the game so far. And it doesn’t keep track of the complete history of the at-bat so far. For example, the count can not tell us what order the pitches have been thrown in so far. A count of 2 balls and 1 strike could have had the strike be the first pitch or the third pich. Either way, we end up with the same (2,1) count.

This is a trick that is generally very useful – we can enlarge the state-space with extra abstract variables to account for the past history as well as the current situation.

Parameterization

For our very simple 1/2-inning model, the only source of randomness is how each pitch comes accross the plate, and how the batter responds to the pitch. For the way we’ve broken things down, this is represented by two discrete probability distribuitons. The first distribution gives the chances for outcomes of the pitch – ball, strike, foul, or the batter making solid contact. The second distribution gives chances of various outcomes when the batter makes contact – pop-out, single, double, triple, or home run.

These two distributions we need are not “conventional” statistics used when describing baseball games. But they are available in crude forms if you do a little search. With a little searching, I found the following. In the major leagues, about 1/3rd of pitches are balls, 1/4 are strikes, 1/4 are fouls, and 1/6th are hit into the field. Of those pitches hit, about 70% are outs, 20% are base hits, 6% are doubles, 1% are triples, and 3% are home runs.

history of hitting in the MLB
history of hitting in the MLB

Implementation

[Show code]
#!/usr/bin/env python2

import scipy


class RandVariable():
    def __init__(self, weights):
        s = float(sum(weights))
        self.w = [ w/s for w in weights]
        self.n = len(self.w)
    def rand(self):
        rv = scipy.rand()
        i = self.n-1
        while rv > self.w[i] and 0 <= i:
            rv -= self.w[i]
            i -= 1
        assert 0 <= i
        return i

# ball, strike, contact, foul ball
# http://research.sabr.org/journals/study-of-the-count-yields-fascinating-data
pitchDistro = RandVariable([0.38, 0.154+0.087, 0.144+0.066, 0.169])

# use 0 to represent getting out
# " 1 is a single
# " 2 is a double
# " 3 is a triple
# " 4 is a home run
# http://www.highheatstats.com/2013/01/reliving-the-hits-how-hit-distribution-has-changed-in-mlb-history/
x = 0.066/0.21
hitDistro = RandVariable([1-x,0.67*x,0.2*x,0.03*x,0.1*x])

class Batter:
    def __init__(self):
        self.contact = False
        self.walk = False
        self.out = False
        self.strikes = 0
        self.balls = 0
        print("\t\tNew batter")
    def __str__(self):
        if self.out:
            return "Strike-out"
        if self.walk:
            return "Walk"
        if self.contact:
            return "Contact"
        s = str((self.balls, self.strikes))
        return s
    def copy(self):
        o = self.__class__()
        o.contact = self.contact
        o.out = self.out
        o.outs = self.outs
        o.walk = self.walk
        o.strikes = self.strikes
        o.balls = self.balls
        return o
    def pitchball(self):
        self.balls += 1
        if self.balls == 4:
            self.walk  = True
    def pitchstrike(self):
        self.strikes += 1
        if self.strikes == 3:
            self.out = True
    def pitchfoul(self):
        if self.strikes < 2:
            self.strikes += 1
    def pitchcontact(self):
        self.contact = True

    def neighbors(self):
        if self.out or self.contact or self.walk:
            return
        yield self.copy().pitchball()
        yield self.copy().pitchstrike()
        yield self.copy().pitchcontact()
        yield self.copy().pitchfoul()
        return

    def __next__(self):
        pitches = [ self.pitchball, self.pitchstrike, \
            self.pitchcontact, self.pitchfoul ]
        pitches = [ self.pitchball, self.pitchstrike, \
            self.pitchcontact, self.pitchfoul ]
        pitches[pitchDistro.rand()]()
        print("\t\t"+str(self))

    def atbat(self):
        while not (self.out or self.contact or self.walk):
            next(self)


class Field:
    def __init__(self):
        self.bases = [0, 0, 0, 0]
        self.runs = 0
        self.outs = 0
        self.verbose = True

    def __str__(self):
        s = "Runs:%d, Outs:%d, On base:"%(self.runs, self.outs)
        s = s + str(self.bases)
        return s

    def __next__(self):
        assert self.outs < 4
        batter = Batter()
        batter.atbat()
        if batter.out:
            self.outs += 1
        elif batter.walk:
            i = 1
            while True:
                if self.bases[i] == 0:
                    self.bases[i] = 1
                    break
                elif i == 3:
                    self.runs += 1
                    break
                i += 1
        elif batter.contact:
            h = hitDistro.rand()
            if h == 0:
                self.outs += 1
            else:
                self.bases[0] = 1
                while h > 0:
                    self.bases = [0] + self.bases
                    self.runs += self.bases.pop()
                    h -= 1
        else:
            assert False, "should never be reached"
        print('\t'+str(self))

    def halfinning(self):
        while self.outs < 3:
            next(self)

def main():
    runs = [0, 0]
    print("Start of game, no score")
    for inning in range(1, 10):
        for team in [0, 1]:
            print("\tTeam taking the field")
            x = Field()
            x.halfinning()
            runs[team] += x.runs
            s = "\nInning: %d, Batting Team: %d"%(inning, team)
            #s += "\nScore: %d to %d"%(runs[0], runs[1])
            s += "\n%d, %d Score"%(runs[0], runs[1])
    print(s)
    return

if __name__ == '__main__':
    main()

Summary

So, what do we see from these simulations? In each case, the simulations are “direct” implementations of our problem – the simulations require just little math to translate the original situations into a computer algorithm. We don’t actually roll dice but we do simulate dice rolls when needed.

The first and last examples make use of a library function called rand(). This is a random number generating “function” – every time we call it, it returns a different decimal number between 0 and 1. (it isn’t really a function in the mathematical sense because functions always give the same output for the same input) While each individual call is unpredictable, on average, the numbers returned by rand() are uniformly distributed across the [0,1] interval. Most other random things like dice rolls and shuffles can be simulated in terms of these random decimal numbers. We actually should call these numbers pseudo-random instead of random because the outputs of rand() are generated by a fancy algorithm, and random numbers, by definition, do not follow any algorithm. Most of the time, this distinction doesn’t matter too much – if it looks like a duck, and quacks like a duck, we might as well call it a duck – but there are a few situations, such as in cryptography applications, where the difference can be very important.

Another thing we may notice is that despite the large number of simulation runs we use, there is still uncertainty about the results. There is a fundamental limitation on all Monte Carlo simulation methods like the one above – they converge to an answer at a rate proportional to the square root of the number of trials. Thus, to get an answer that is twice as accurate, we have to quadruple the number of trials. The gamblers of the 1600’s did not realize this precisely, but they knew it was hard to tell exactly how fair a game was by trial and error, and hoped that some mathematical calculations would be a better way. Pascal, Fermat, Huygens, and Laplace obliged them.

The term “Montecarlo method” was coined by Stanislaw Ulam and John von Neumann during their work on the Manhattan project. Ulam and von Neumann were working with some of the first programmable computers, but the idea of calculation by simulation can be traced back farther. Edward Molina used human computers to do “throw-down” simulations of telephone traffic at Bell labs in the 1920’s. And special problems, like Buffon’s needle problem, have histories of experiment-based calculation tracing even further back in time (Hall, 1872, Wolf, 1850).

You might find some of the following interesting…

Exercises

  1. (Open ended) Improve our script for simulating Gombaud’s bet so that it performs 100 million runs. How accurate is the estimate for the chance of winning? Is there a relationship between the number of runs and the accuracy of our estimate of winning?

  2. Using montecarlo simulation, estimate the distribution for the maximum number of sequential heads in 100 flips of a fair coin. Perform atleast 100,000 simulation runs. Present your results as a plot the number of runs with fewer than \(n\) heads in a row as a function of \(n\).

  3. Montecarlo simulations can also be used to calculate things like the area of a unit circle \(\pi\).

    1. If you draw a square circumscribing a unit circle, and then pick a point \((x,y)\) uniformly randomly from that square, then according to the principle of indifference, the chance that point is in the unit circle itself is \(\pi/4\). Write a python program that samples 10 million points from this square, counts how many are in the unit circle, and uses this to estimate \(\pi\). The pythagorian theorem can be used to check if a point is inside the unit circle. How many digits of accuracy does you estimate achieve?

    2. For comparison, use the formula \[\operatorname{arctan}{\left (\frac{\sqrt{3}}{3} \right )} = \frac{\pi}{6}\] and the Taylor series of \(\operatorname{arctan}\) around \(x=0\) to estimate \(\pi\). How many terms do you need to use to obtain the same accuracy as your Montecarlo simulation?

    3. Which approach is a better way to calculate \(\pi\)?

  4. Imagine you start with an urn containing 2 stones, one black and one white. Reach into the urn, pick out a stone, then return the stone and a second stone of the same color as the one you picked out (so now there are 3 stones in the urn). Repeat this over and over until there are 1,000 stones in the urn.

    1. What do you expect to have happened to the stones in the urn? Will most of the stones in the urn be the same color? Or will do you think it’s more likely that the ratio of white to black stones will be 1 to 1? Or something different?

    2. What should the empirical PDF and CDF of the outcome look like under your prediction for part a?

    3. Simulate 10,000 runs of this process, and plot the empirical PDF and CDF.

    4. Discuss.

  5. Poisson’s apocraphyl problem of estimating the number of deer in the King’s forest (references to Paxton, Good, and Fisher) The games keeper can recognize and distinguish all the deer after he sees them once.

  6. Use our baseball model to answer the following questions

    1. Modify this code to draw a scatter-plot of the scores for 100 games.

    2. What is the average margin of victory in these games?

  7. In a previous homework problem, we attempted to model the number of cases of measles in a household using a binomial distribution, assuming each case was independent of each other cases, and got poor fits to the data. Greenwood and Yule proposed a better model in 1920. In this model, disease transmission is broken into discrete generations, and the number of infectious people in each generation is a binomial random sample from the currently susceptible people. People can only get the disease once, so people sick this week are recovered and resistant against re-infection next week.

    1. Suppose a family has 1 sick kid with measles and 2 susceptible kids. What are the 3 possible states next week and their probabilities?

    2. What are the family’s 4 possible states the second week, and their probabilities?

    3. What are the probabilities that when the outbreak ends, the family had a total of 1,2, or 3 cases of measles?

    4. How do these results compare with Wilson’s data on the observed frequencies of different outbreak sizes? which wilson data?

  8. According to Rorty (1903), if the chance that 12 phone calls occur in one minute is 1/100, then the chance atleast one of the 60 minutes in an hour contains 12 calls is 60/100. Why is this wrong? What’s the correct estimate?

( previous, home, next )