( previous, home, next )

Topics

• Limiting approximations
• Description of the Poisson process
• Telephone traffic modelling

• Probability and its engineering uses by Thornton C Fry, 1928

Suppose you want to know how long ago something happened. For example, a forensic analyst might want to know how long ago a dead person ate breakfast. Or an astrophysicist might want to know the age of the solar system. Or an archeologist might want to now how old the dead-sea scrolls are, or how long ago did we evolve a desire for art. How do we tell?

Well, we can build a mathematical model, fit it to some data, and get out an estimate. Our model will consist of some timing process under which a system’s state evolves from some starting point to an end point. The end point will be measured to provide the data. Some secondary source of information will be used to estimate the system’s starting point. Then, based on the timing process, we can estimate how much time between the starting point and the end point.

This basic process applies to dendrochronology (tree rings), chronostratigraphy (dating by rock layers), genetic clocks, magneto and perhaps most famously to radiometric dating. The radioactive decay was discovered by Henri Becquerel and subsequently studied in the research of Marie and Peire Curie, Ernest Rutherford, and Hans Geiger. In 1896, Becquerel discovered that uranium salts could expose photographic paper that was still wrapped in light-proof paper. Further research with Marie and Peire Curie lead to the isolation of radioactive elements thorium, radium, and polonium. The cause of this penetrating radiation was eventually determined by Rutherford and Geiger to be due to the decay of atoms through the fragmenting of their nuclei.

Radioactive decay processes are useful in dating objects because they appear to be very regular. We can measure the amount of isotope left at the end using a Geiger counter or related method. The one challenge is knowing the initial isotope levels when the object of interest was made. For short-term experiments involving $$P^{32}$$, initial levels can be measured directly. For dating problems involving carbon-14, we can not go back in time to measure initial isotope levels. But we believe that cosmic radiation processes have kept the atmosphere’s carbon isotope ratios relatively constant over time, and so any departures of these ratios after the creation of an object will be due to radioactive decay. There can, of course, be complications. For example, atmospheric nuclear weapons tests during the 1950’s and 1960’s dramatically increase the amount of carbon 14 in the atmosphere. While such events can complicate some dating processes, they can also be exploited to estimate rates of carbon turnover in parts of the human body.

## Continuous-time Markov chains

Before proceeding with our analysis, we need to introduce the idea of a continuous-time Markov chain. Imagine that we have a Markov chain defined in time steps of $$\Delta t$$, $\begin{gather} p(t+\Delta t) = A(\Delta t) p(t). \end{gather}$ The individual transition probabilities $$a _ {ji}(\Delta t)$$ of $$A$$ depend on the time step, but as always, must sum to unity: $1 = \sum _ j a _ {ji}(\Delta t).$ Now, suppose we start shrinking our time-step. How do we expect the transition-probabilities to change? Well, intuition suggests that as the time-step becomes smaller, it is less likely that the chain will change state, so the probability of staying in the same state will increase, with the probabilities of moving to a different state will decrease, so $\begin{gather} \lim_{\Delta t \rightarrow 0} a_{ii}(\Delta t) = 1, \\ \lim_{\Delta t \rightarrow 0} a_{ji}(\Delta t) = 0, \quad j \neq i. \end{gather}$ Let’s suppose that the off-diagonal entries become linear in the time-step-size as it shrinks, so $$a _ {ji}(\Delta t) \approx m _ {ji} \Delta t$$. Since the columns must sum to unity, this also implies $$a _ {ii}(\Delta t) = 1 - \Delta t \sum _ {j} m _ {ji}$$. Rewriting this in matrix form, we have for small $$\Delta t$$, $\begin{gather} p(t + \Delta t) = (I + M \Delta t) p(t) \end{gather}$ where the off-diagonal entries of $$M$$ are non-negative, the diagonal entries of $$M$$ are non-positive, and the columns of $$M$$ sum to zero. We call $$M$$ the transition-rate matrix of the chain. If we now re-arrange and take the limit as $$\Delta t \rightarrow 0$$, we recover a first-order linear differential equation system $\begin{gather} \frac{dp}{dt} = M p. \end{gather}$

### Exponential decay equations

How long atoms are around is measured by “half-life” and decay rate. Phosphorus-32 ($$P^{32}$$) has a halflife of about 14.29 days. Carbon-14 ($$C^{14}$$) has a halflife of about 5,730 years. Uranium-235 has a halflife of about 700 million years.

Suppose we start with $$n = 1$$ atom of $$P^{32}$$. What is the chance it is still here after 28.6 days?

1-atom model and dt dependence

\begin{align} \frac{dp_0}{dt} &= r p_1, \\ \frac{dp_1}{dt} &=-r p_1. \end{align}

\begin{align} p_0(t) &= 1 - e^{-rt} \\ p_1(t) &= e^{-rt} \end{align}

Since $$2 p _ 0 = p _ 0 e^{r t _ {1/2}}$$, then $$r = (\ln 2)/ t _ {1/2} \approx 0.0485$$

Show, suppose instead of starting with a single atom, we start with a large number of atoms $$N$$.

\begin{align} \frac{d p_0}{d t} &= r p_1 \\ \frac{d p_k}{d t} &= - r k p_k + r (k+1) p_{k+1}, \quad k = 1 \ldots \infty \end{align} This infinite system of linear differential equations is known as the master equation for the process. Generally, closed-form solutions of the the master equation can be hard to find. But if we start with exactly $$N$$ atoms at time $$t=0$$, then the exact solution of this system of equations is $p _ k(t) = {{}_{N}C_{k}} e^{-krt} (1-e^{-rt})^{N-k}$ with $$p _ k(t) = 0$$ for all $$k > N$$. This is a binomial distribution, where the expected value $\text{E}( k ) = N e^{-rt}$ while the variance $\text{Var}(k) = N e^{-rt} (1-e^{-rt}).$ We can check this solution by direct substitution back into the master equation.

To find the most likely time at which we might observe $$k$$ atoms to remain, we can treat $$p _ k(t)$$ as a likelihood function of $$t$$. The most likely time will be the one that maximizes $$t$$. By differentiation, we find the maximum likelihood estimate (MLE) $\begin{gather} t_{MLE} = \frac{1}{r} \ln\left( \frac{N}{k} \right) \end{gather}$ The precision of this estimate can be approximately expressed in terms of the Fisher information. The Fisher information is $$I(\theta) = - \partial^2 _ {\theta} \ln \mathscr{L}(\theta)$$ and 95 % confidence intervals under the Normal approximation are $\begin{gather} t _ {MLE} \pm \frac{1.96}{\sqrt{I(t _ {MLE})}} = \frac{1}{r} \left[ \ln\left( \frac{N}{k} \right) \pm c \sqrt{\frac{1}{k} - \frac{1}{N}} \right] \end{gather}$

## Telephone networks

On January 28th, 1878, George Coy opened the “District Telephone Company of New Haven”, in New Haven, Connecticut. Two years previously, Elisha Gray and Alexander Graham Bell had been in a race to patent a telephone capable of transmitting voice. Businesses from apothecaries to burglar alarm companies were experimenting with the concept of telephone, but nobody had yet found the right niche for this technology. Coy’s company was the first commercial service open to the public where people paid to subscribe. The company’s initial 21 customers grew to 50 within 2 months and included private residences, doctors and dentists, meat markets, tea companies, and boarding stables. This new network featured an “exchange” at which an “operator” could interconnect any of its 8 different phone lines to each other. If everything worked out just right, as many as 4 different calls could happen at a time, but the specifics of the exchange actually did not allow for more than 2 calls at a time. Other startups soon followed.

The invention of a practical telephone in the mid 1800’s, thanks to the efforts of Meucci, Gray, Bell, and others, was a revolutionary event – the first network appliance that everybody had to have. While the telegraph had preceded the telephone by 30 years, the problem of rapid growth had not previously arisen. The telegraph required special training for the sending and receiving of Morris code – few private customers had this training or felt the need to send their own messages. But with the telephone, there was no special training needed and you could now communicate directly to the person on the other end of the phone. Quickly, people saw the value of having a telephone in their own house, and the demand for phone service grew quickly. In the United states, from the creation of the first commercial telephone switchboards in 1878, there were around 50,000 subscribers by 1880, half a million by 1900, and two million by 1905. This rapid growth in demand in the US and around the world had never occurred before and created unprecedented demands on infrastructure engineering.

Young telephone companies quickly had big problems. Customers expected to be able to call from any phone to any other phone. The naive solution to the demand for telephone service was to build a telephone line from every house to every other house. But this was clearly impractical. To connect $$n$$ houses to each other, the telephone company would need to build $$(n^2 - n)/2$$ phone lines (after removing self-lines and allowing one line to work in either direction). So, 100 houses would need almost 5000 lines! Fortunately, there was a relatively simple alternative. Instead of connecting all phones directly to each other, the company could build “trunk” lines. Telephones in individual neighborhoods would connect to a local trunk, and trunks would connect to a central hub. Then, for example, if a customer wanted to call a merchant in a different neighborhood, their call would connect from their phone to the neighborhood trunk, travel down the trunk to the central hub, travel out from the hub along the merchant’s neighborhood trunk, and then connect from the trunk to the merchant’s phone.

This star-like arrangement of telephone lines greatly reduced the number of lines needed, but introduced it’s own problems. For one thing, it created the job of “telephone operator” which you might see in old black-and-white TV shows like “Lassie”. Operators had the job of making the physical connections between different lines until machines took over. A second problem was that trunk lines could only make one call at a time. If you wanted to make a call using the trunk line, but somebody else was already in the middle of a call, the line was busy, and you had to wait until that call was done and the line was open again. Of course, the company could alleviate this by building more trunk lines, but how many to build?

### The telephone problem

Malcolm C. Rorty, from Paterson, New Jersey, building on work started by G. T. Blood in 1898, was perhaps the first to attempt to model telephone traffic in a 1903 paper for AT&T. Rorty considered how the number of calls expected to be received in one hour and then applied the Normal approximation to the Binomial distribution to calculate the chance of seeing so many more or less calls than the average in a given time interval.

expand, molina’s contributions, Poisson approximation, others

We wish to characterize the process of events occurring over a fixed period of time. These could be the starting of new telephone calls (or the arrival of a hurricanes each year, or the falling of a new snowflake onto your hand). Within this period of time, we do not have reason to expect the events to occur at one particular time rather than another – events are equally likely to occur at all times. Then the rate at which we go from having no calls to having 1 call arrived should be the same as the rate of going from 1 call to 2 calls, or $$k-1$$ calls to $$k$$ calls. If the chance of a call arriving in an instant $$( t, t+dt)$$ is $$\lambda dt$$, and $$p _ k(t)$$ is the probability that $$k$$ calls have arrived by time $$t$$,

\begin{align} \frac{dp _ 0}{dt} &= - \lambda p _ 0, \\ \frac{dp _ 1}{dt} &= \lambda p _ 0 - \lambda p _ 1, \\ \frac{dp _ 2}{dt} &= \lambda p _ 1 - \lambda p _ 2, \\ & \cdots \\ \frac{dp _ k}{dt} &= \lambda p _ {k-1} - \lambda p _ k, \quad k = 1,2,...\infty \end{align}

These equations are summarized by the following digraph.

This master equation also has a very informative closed-form solution. Suppose we start with no phone calls having yet arrived. Then the solution turns out to be a Poisson distribution. The expected number of calls grows linearly with time; the probability that $$k$$ calls arrive in a time window of length $$t$$ is

$p _ k(t)= \frac{(\lambda t)^k}{k!} e^{-\lambda t}.$

You can confirm this solution by substituting it back into the differential equations. This process of arrivals is called a Poisson process. We can see that it agrees with out intuition, in that the expected number of phone calls to arrive by time $$t$$ grows linearly with time. Interestingly, the standard deviation in the number of calls arrived by time $$t$$ grows only in proportion to the square root of the elapsed time.

## Duration

The main complication in telephone calls, relative to the simple arrival process, is that telephone calls can not be treated as instantaneous events. Telephone calls have a duration, and while they are on-going, they block their telephone line from use by other callers. The important quantity is not the total number of calls that have arrived, but the number of calls that are currently ongoing relative to the number of lines available. Blocked calls could be held, and then started once the line becomes free. Or they could be disconnected and retried at some point in the near future, or simply abandoned. When estimating line demand, one will do better if we take into account frequency, duration, and the handling pattern for blocked calls.

The early American work on the these problems assumed all calls had equal duration. In 1908, the managing director of the Copenhagen Telephone Company had recognized that his company was wasting money, with some switchboard operators rarely getting calls while other switchboards seemed to be constantly jammed. The manager hired a mathematician named Agner Erlang to help answer this question. Erlang observed that duration of phone calls was not constant. The duration actually seemed to correspond rather closely to an exponential distribution.

For short calls, there were some deviations from the exponential possibly do to the time necessary to ascertain wrong numbers. But over-all, the observations were close matches to an exponential distribution. Our initial phone-call model can be amended to account for this systematic termination of calls through the merger with our radiometric decay model.

If there are $$N$$ lines available and blocked calls are immediately abandoned, the master equation is then \begin{align} \frac{dp _ 0}{dt} &= r p _ 1 - \lambda p _ 0, \\ & \cdots \\ \frac{dp _ k}{dt} &= \lambda p _ {k-1} - ( \lambda + r k ) p _ k + r (k+1) p _ {k+1}, \quad k = 1,2,...N-1 \\ & \cdots \\ \frac{dp _ N}{dt} &= \lambda p _ {N-1} - r N p _ N. \end{align} We could try to solve this system, but what we are particularly interested in is the steady-state Erlang pointed out this system possesses. Setting $$\dot{p _ k} = 0$$, we find that at “statistical steady-state” as Erlang called it, the probability $p _ k ^ *$ that there are $$k$$ lines in use is $p _ k ^ * = \frac{1}{k!} \left( \frac{\lambda}{r} \right)^k p _ 0 ^ { * }$ Using the normalization that our probabilities form a distribution and sum to one, $\sum _ k p _ k = 1,$ we can eliminate $$p _ 0$$ and show $p _ k ^ * = \frac{1}{k!} \left( \frac{\lambda}{r} \right)^k \left( \sum _ {j=0}^{N} \frac{(\lambda/r)^j}{j!} \right)^{-1}.$ As $$N \rightarrow \infty$$, this converges to the Poisson distribution $p _ k(t)= \left( \frac{\lambda}{r} \right)^k \frac{e^{-\lambda/r}}{k!}.$ But the really useful thing that Erlang and the engineers wanted to know was how many lines they needed to build. According to Erlang’s result, the equilibrium probability the phone system is operating at maximum capacity and new incoming calls are blocked is then $p _ N = \frac{1}{N!} \left( \frac{\lambda}{r} \right)^N \left( \sum _ {j=0}^{N} \frac{(\lambda/r)^j}{j!} \right)^{-1}.$ This is commonly called Erlang’s B formula. If we know we want calls blocked no more than 1 % of the time, and we’ve estimated the arrival and termination rates $$\lambda$$ and $$r$$ from observations, then we can solve Erlang’s B formula for $$N$$ to determine how many lines we really need. Before the binary age, solving this equation for $$N$$ would have been time-consuming, so phone companies employed human computers to perform the calculations and produce paper graphs that could be used as quick references in the field. Its success helped spur on further applied math research in the telephone industry. Today, this formula is still widely applied in call-center scheduling and que design.

# Exercises

1. In 1910, Ernest Rutherford and Hans Geiger published a paper on the observation of scintillation events caused by $$\alpha$$-particles striking a photographic plate. Harry Bateman showed Rutherford and Geiger’s data was well described by the Poisson distribution.

1. Find and colate the final data from the first table of their paper for the frequencies of each number of scintillations in 1/8th minute intervals.

2. Check Bateman’s fit of a Poisson distribution to the data using the maximum likelihood method. What is the most likely value of $$\lambda$$?

3. Estimate $$\lambda$$ using linear least squares to minimize the total square error between the observed and predicted frequencies over 2,608 observationn intervals.

2. In 1920, Agner Erlang, telephone engineer and pioneer of queing theory, presented a lecture including the following data for duration of calls at the main Copenhagen Telephone exchange as an example of how well practical data was conforming to his conceptual theories. Find the best-fitting exponential distribution to this data using linear least squares and an appropriate transformation of the empirical cumulative distribution function.

[ Data : hide , shown as table , shown as CSV shown as QR ]

#duration,fraction
# Duration of calls at the Copenhagen Main Exchange in 1916
# binned based on 10-second intervals.
# Data extracted from Table 1 of A. K. Erlang's 1920 lecture paper
# "The application of the theory of probabilities in telephone administration
# Extracted by T. Reluga, March, 2017
#
# Columns
# 1. duration of call measured in seconds
# 2. fraction of all calls with this duration or up to 10 seconds less
10,0.07168
20,0.12494
30,0.09410
40,0.08702
50,0.06292
60,0.05584
70,0.05163
80,0.03865
90,0.04168
100,0.03461
110,0.02955
120,0.02432
130,0.02432
140,0.01843
150,0.01674
160,0.02045
170,0.01472
180,0.01101
190,0.01371
200,0.00781
210,0.01270
220,0.00966
230,0.01371
240,0.00882
250,0.00646
260,0.00798
270,0.00528
280,0.00376
290,0.00376
300,0.00393

1. If telephone calls arrive at a rate of 1 per minute and last an average of 8 minutes, how many phone lines would you need from a neighborhood to make sure the system is maxed out no more than 1 % of the time?

2. Suppose a Poisson process is used to describe passing by a person attempting to cross a one-way street, where $$\lambda$$ per second is the expected number of cars per minute, and the person needs $$T$$ seconds to cross the street. Find an expression for how long the person should expect to wait before being able to cross.

3. When the number of subscribers is relatively small, this is not a good approximation because as more calls are placed, the number of subscribers who might place a new call decreases. In 1918, Tore Engset published an analysis of this problem.

1. Assuming there are $$S$$ subscribers and $$N$$ lines, that the hazard rate for each individal subscriber to place a call is $$a$$ and that the hazard rate for each individual call to terminate is $$r$$, and that blocked calls are lost, draw a digraph of the continuous- time Markov chain for the number of lines in use at a given time.

2. Construct the master equation for this Markov chain.

3. At steady-state, solve for $$p_1$$ in terms of $$p_0$$. Then solve for $$p_2$$ in terms of $$p_1$$, and so on. Find a pattern for the $$p_j$$’s term.

4. Under what conditions will Engset’s solution and Erlang’s solution approximately agree?

4. Every day, around 100,000 kilograms of meteoroids strike the earth, mostly in the form of stellar dust. That might seem like a lot, but not when compared to the $$6 \times 10^{24}$$ kilogram mass of the earth – it’s adding less than $$10^{-14}$$ percent each year. And most of these meteorites are so small, we won’t see them. There are also a few large meteors, technically called “bolides”, that enter our atmosphere and create impressive fireballs (another video compilation). Thanks to a US government program to watch out for intercontinental ballistic missiles, NASA’s Jet Propulsion Laboratory keeps a very nice database of large meteors. We’d like to model this data.

1. Use this script to retrieve a data set of dates and times of observations of meteor arrivals.

2. It’s always a good idea to plot a data set, if we can. Our eyes can sometimes help us pick out important features in the data quickly. Make a shot-plot for 1995, where we plot date as the independent variable and mark the date of each meteor arrival with a spike.

3. Make a rank-arrival plot of the data from 1995 to 2005, where the independent variable is the rank of the arrival time (the first meteor is rank 1, the second to arrive is 2, …) against the actual arrival time. If that curve is a perfectly straight line, that means the meteors are arriving at a constant rate with a regular period in between. If the curve is a little irregular but still mostly straight, that means the time between arrivals has some randomness, but the average arrival rate is constant. If this plot was more of steps or a zig-zag, that would mean meteors were arriving in clusters. If the plot is concave or convex, that would suggest a changing rate of arrival.

4. Based on the total number of meteor arrivals from 1995 to 2005, what was the average arrival rate (per day)?

5. Plot a histogram of the number of meteors arriving every 45 days over the same 10 years.

6. Fit a Poisson distribution to this histogram and estimate the Poisson distribution’s intensity parameter $$\lambda$$. Plot your your histogram and your fit Poisson distribution on top of each other so we can compare them.

7. Make a probability plot comparing your fit to your histogram. (If the fit is good, this should be close to the line $$y=x$$.

8. If the meteors are really arriving according to a Poisson process, the times between meteor arrivals should be exponentially distributed. Are they?

5. Find a data set of telephone traffic from the 1908 era. A telegraph data set from then or earlier would also be interesting.

( previous, home, next )