# Markov chains

( previous, home, next )

## Background

While probability distributions can be very useful models, they don’t help us describe or analyze how a system changes over time. For that, we need to create new tools. One of the most useful is the Markov chain. A Markov chain is a special kind of stochastic process than can describe systems changing randomly over time. A Markov chain consists of a set of “states”, and a rule for each state that gives the chances for which state will come next. The change from one state to another is called a “transition”. Each rule determining the transition from one state to the next is described by a probability distribution called the “transition probabilities”.

The childhood board game Chutes and ladders can be described as a Markov chain. Recall that the rules of the game are as follows. The game is played on a $$10 \times 10$$ board, with the squares numbered 1 to 100. A player has a piece to mark their current spot on the board. In each turn, you roll the dice, and depending your roll and your current spot, you count forward to see where you land next, including any chutes or ladders you land on at the end of your turn. Your ‘state’ in the game is your piece’s current spot. The allowed transitions are all the possible spots you might land next from your current spot. And the transition probabilities are the probabilities of each needed dice roll.

The key property of a Markov chain is that once you know the current state, extra information about the past history of system does not provide any additional information for predicting the future states – all useful information is captured in the current state. Any stochastic processes with this property is called “Markovian”. Some stochastic processes are not Markovian. For example, a marble bouncing between different numbers on a roulette wheel does not form a Markov chain because the chances of different outcomes depends not just on the marble’s current number, but also on if this is the first number the marble hits and so is moving fast, or if it’s a later number and getting close to settling.

After this first pass, the dependence only on the current state seems like a serious restriction. Markov chains seem like they might be useful only occasionally – lots of things that we might like to model have history dependence in some form. The roulette wheel marble’s speed, which depends on how many times it has already bounced, or in baseball, where the outcome of a pitch depends not just on how it is thrown but also on how many balls and strikes the batter has accumulated in the past. BUT, as we will see, this is not as big a hurdle as it first appears.

## Weather in the land of Oz

In Oz, the weather switches among 3 states every day… Sunny, Rainy, or Snowy.
The weather department observed the following daily history in Oz over 9 days

0 Sunny
1 Sunny
2 Sunny
3 Sunny
4 Rainy
5 Snowy
6 Snowy
7 Rainy
8 Snowy
9 Sunny
... 

The longer history in compressed using (s) for sunny, (w) for snowy, and (r) for rainy, over 1600 days is

sssrwwrwssssrwwwrwwrssswwrssssswwrswrwrssswswrwwrsrwrswsrswssswwssrwwrsssssrwrss
srswrswsswwssssssrwswrwswwwwwrswsssrwwwwwwwwsrwrwsrwrswswwwwrwwwwwrswrswsssssrww
wwwwwwwwrwwsrssrwwsrwrssssswrssssswwswwwrssssswsswwwwsssswwssswwrswwssssswwwrsrs

### Matrix analysis and Digraph representation

The first step in analyzing data we want to model as a Markov chain is to count the transitions to see what occurs and how often.

[Show code]
data = "sssrwwrwssssrwwwrwwrssswwrssssswwrswrwrssswswrwwrsrwrswsrswssswwssrwwrsssssrwrsssrwwrswssssrwwwwsrwwsswssrssrsssswrsrwwwrsrsrwrsrwsrsssssrwwssrwsswsswrwswswwwswwwrssssrssswssssswwwwwwwrssswwwwssssswrwwwwwwwswswrwwswrwwrsrwrswsswrsrwwwwwwwrswwwrswswwrssssswrwswssssrssssrsrwrsssssrssrwrwwrwwwwwwsrwssrwwwwrsssrsswsssrsswwwswswrssssswwrssrssrwwrswwwwwwwssssrswrsssrwwsssrwwrsrwsrwswwwsswwwwwwwrssswrwsrsrswrswsswwssssssrwswrwswwwwwrswsssrwwwwwwwwsrwrwsrwrswswwwwrwwwwwrswrswsssssrwwrwsrsssssrssrsrswwwrwsswwwrsssssswwrwsrwwwwrwwwrssrwrwwssrssswswrwwwsswrsrsssrwrswrssrsrswssrwwrswrwrwwswsrwssssrsssswwsssrswwrsswrwrssrswrwwwrwssrssrsrsswwwrwwwsrwwwsswswwwwsswsssrsrsrswwwwwwrssssrwrwsswsswsrwrssssrsrwwswrsrssrssssswswrsswrsrssrwwwwrswwrswrswwwrwrssrwwwsrsswwssrswwwwwwwwrwwwsswswsrwrssswrwwsrwwsrwswrsrwrswwrswsrsrsswwwwswwssrssssrwsrsswssswssssrwssrsswwssrsssswwwrswwsrwwsrsrsswwrwrssswwsrssrssrsrwwsrwwsrwwwsrsrsrwrswwrwwsswrsrssssrsrwwwssswwwwwsrwwwssrsswwwrssssrwwrwwrwrsswwwwwwwssrwwwwwrswrssswsrwwrsswwrwwwwwrwrwswwrsrwwrwrsrsrssrwsrsrsswwswwsssssssrwwwwssswwwrsrwsswsswrssrssrwrsssrssssrwsrwsrswrswsrwsrssssswrwswssrswssswrswwsswwrssrsssswwrwsssssrsssrswsrsrwwrwwwwrwsrssrwwrwrwsssrwswsssrwssrwwrwrwswwwrwrwwwwwwwssswwsrsswrsrsswsrssrwrwwwwsssrwwwrwwwrssssswrssswrssrwwsrsrswrssswsrswwsrwrwwwrsswwrssssrwwwwwrwwwwrwwwwwwrsrsssrwrwswswrwwwwrsrsrwrwrsswsrwwsssssswwwswwwwwwwrsswwwwrswwwrssrwswrwsrwwwsrwwwwsrssrwswswwsrwswwwwsswrsssswwwswrsswsswsssrwwsrwwwrwssrssssrsssssrsssrwrswrsrwwsswrwrwwwwswwrwwssswsrsswsssrwwwwwwwwwwrwwsrssrwwsrwrssssswrssssswwswwwrssssswsswwwwsssswwssswwrswwssssswwwrsrs"

states = [ 's', 'r', 'w' ]

pairs = zip(data[:-1], data[1:])

counts = {}
for i in states:
for j in states:
counts[(i,j)] = pairs.count( (i,j) )
print i,j, counts[(i,j)]

from pylab import *

C = array([[ counts[(i,j)] for j in states ] for i in states])
S = diag(ones(3).dot(C)**-1)
A = C.dot(S)
print C
print S
print A

Counts of pairs Sunny today Rainy today Snowy today
Sunny tomorrow
321
172 160
Rainy tomorrow
176
0
152
Snowy tomorrow
156
156 306

One of the things we see in inspecting this is that it never rains two days in a row, although it may snow or be sunny for many days in a row.

At this point, our counts are equally valid reading the weather forward or backward. But since we like to be able to think of time as flowing forward, its useful to renormalize the data by dividing each column by its total to get approximately the chance of each transition, given the current state. Based on the Normal approximation of the binomial distribution, we can not expect get an accuracy of more than $$1.5/\sqrt{600} \approx 1/16$$. If we round to the nearest sixteenth (making sure columns sum to 1), the transition probabilities come out as follows.

Prob( x -> y ) Sunny today Rainy today Snowy today
Sunny tomorrow 1/2 1/2 1/4
Rainy tomorrow 1/4
0
1/4
Snowy tomorrow 1/4 1/2 1/2

Notice that the columns now sum to 1. In every proper Markov chain, the columns of the transition matrix always sum to 1. We can also represent the transitions between states with a labelled directed graph.

We can simulate this chain with an 8-sided die, or computer simulations with a very simple python algorithm.

[Show code]
#!/usr/bin/env python2

from scipy import rand

s = 'S'
r = 'R'
w = 'W'

transitions = {s: (s,s,r,w), r:(s,s,w,w), w:(s,r,w,w)}
states = transitions.keys()

def rndStep(state):
return transitions[state][int(rand()*4)]

def rndPath(t, x, tmax):
# x is the initial state
# t is the initial time
# tmax is the time to stop at
while t < tmax:
yield t, x
t, x = t + 1, rndStep(x)

def main():
n = 80*20
path = [x for t, x in rndPath(0,s,n)]
print ''.join(path)

main()

This is not a very general method (though it is fast) of simulating from a Markov chain described with a transition matrix. Better methods like aliasing exist.

## Path integrals

If it is sunny today, what is the chance that it will snow in 3 days? (we don’t care about the weather for the next 2 days)

Well, there are several paths the weather might take between now and then. If we can list all the paths, calculate the probability of each path and then sum up all the paths, that would give us the probability of snow in 3 days.

Paths Probability
SSSW 1/16
SSRW 1/16
SSWW 1/16
SRSW 1/32
SRRW 0
SRWW 1/16
SWSW 1/64
SWRW 1/32
SWWW 1/16

So the probability of snow in 3 days is $$(4+4+4+2+0+4+1+2+4)/64 = 25/64$$, a little better than 1 in 3. This kind of calculation is a special case of what’s popularly called a “path integral”. Path integrals are a generalization of calculus introduced by famous physicist Richard Feynman to help with calculations in quantum mechanics.

Doing a path integral this way is rather tedious – there are many paths to check and it turns out there’s a faster way. We just have to raise the matrix $$A$$ to the appropriate power. If $$\vec{p}(0)$$ is the vector of probabilities of starting in each state at step 0, and $$\vec{p}(t)$$ is the probabilities of ending in each state in step $$t$$,

$\vec{p}(t) = A^t \vec{p}(0)$

By raising the matrix $$A$$ to the third power, we can calculate the probabilities of ending in each state, depending on the starting state.

$A^3 = \left[\begin{matrix}\frac{13}{32} & \frac{13}{32} & \frac{25}{64}\\\frac{13}{64} & \frac{3}{16} & \frac{13}{64}\\\frac{25}{64} & \frac{13}{32} & \frac{13}{32}\end{matrix}\right]$

It’s sunny today, so $$p(0)=[1,0,0]^T$$ and so

$p(3)=A^3 p(0)=\left[\begin{matrix}\frac{13}{32} & \frac{13}{32} & \frac{25}{64}\\\frac{13}{64} & \frac{3}{16} & \frac{13}{64}\\\frac{25}{64} & \frac{13}{32} & \frac{13}{32}\end{matrix}\right]\left[\begin{matrix}1\\0\\0\end{matrix}\right]=\left[\begin{matrix}\frac{13}{32}\\\frac{13}{64}\\\frac{25}{64}\end{matrix}\right].$

Therefore, three days from now it will be sunny with probability 13/32, rainy with probability 13/64, and snowy with probability 25/64.

Some Markov chains, when run long enough into the future, forget their initial state and approach a stationary distribution. These kinds of Markov chains are called ergodic. The simplest test for ergoticity is to see if the chain can every get trapped in a subset of state-space. If you can always move from each state to each another by some path, then the chain is ergodic.

For an ergodic Markov chain, the steady-state distribution $$p(\infty)$$ must solve the equation

$p(\infty) = A p(\infty).$

This also happens to be the eigenvector equation for eigenvalue $$\lambda = 1$$. Since the columns of $$A$$ sum to $$1$$, there must be an eigenvalue equal to $$1$$.

(has simple eigenvalues and steady-state) The characteristic polynomial $$z^{3} - z^{2} -{z}/{16} + {1}/{16} = 0$$ factors to $\left(z - 1\right) \left(4 z - 1\right) \left(4 z + 1\right) = 0$

Calculate equilibrium distribution using reduced row-echelon form. ($$I-A$$ will be singular! But we also want $$p_1 + p_2 + p _ 2 = 1$$.)

$p(\infty) = \frac{1}{5} \begin{bmatrix}2\\1\\2\end{bmatrix}$

So, after forgetting the initial condition, we expect it to be sunny $$2/5$$th of the time, snowy $$2/5$$th of the time, and rainy only $$1/5$$th of the time.

(There is actually a bit more work here that I’m glossing over. Specifically, how can I be sure that $$\lim _ {n\rightarrow\infty} A^n p(0)$$ doesn’t blow up? It’s because the eigenvalues of $$A$$ have to be less than 1, which we establish via the Perron-Frobenius, since the eigenvalue 1 is the “Perron root.”)

## The art of state-spaces

Modelling with Markov chains has two parts. (1) specifying a set of states, called a “state-space”, for the process of interest, and (2) estimating transition probabilities among the states. Determining the transition probabilities, though not always easy, is a statistical inference problem which we will leave to other studies. The art of Markov chain modelling comes in choosing a good state-space.

Whenever we use language to talk about nature, we’re forced to “tokenize” our observations – to assign one of a relatively small number of discrete states of being to represent a moment and place in reality. For example, in our modeling of the weather in Oz, our data described every day’s weather with one of three states : rain, snow, or sun. Nature is seldom this tidy. Oz’s meteorologist made no comment about days where it started as rainy and ended up sunny, or days when it was cloudy but not rainy or snowy. As a general rule, we’d like to use as few states as possible, while still capturing all the important features of the natural process, but the more precise an answer you want, the more state’s you’ll probably want to use.

Example 1: A one-window icecream parlor serves customers 1 at a time. To model this simple service process, we need a state-space that should atleast include the current length of the queue and the current order being filled. If we associate each order with an integer, then the state-space can be modelled by two natural numbers, both of which can be given upper bounds in practice.

Example 2: The shuffling of a deck of cards is an important process in gambling. In a standard deck, there are 52 cards, each of which can have 52 positions. So a naive representation of the state of a deck of cards might be to use 52 natural numbers, one for each position in the deck, with the number representing which of the 52 cards is in that position. Thus, we would allow for $$52^{52} \approx 1.7 \times 10^{89}$$ states. But not all of these states are realistic – no card can appear more than once in the deck, for example. There are only $$52! \approx 8\times 10^{67}$$ actual permutations of a deck of cards. It would be nice if we could find a way to represent just the realistic states as simply as the 52 integers we’ve already suggested (see Diaconis’s work for further discussion). In practice, it is quite common to use representations that included unrealistic states, and we just have to be careful to avoid these states when construction our transitions. Interestingly, you could represent the deck’s state the opposite way also – have one number for each card, and each number would represent the position of that card in the deck. Both styles work.

Markov chains make a major assumption. Markov chains predict of future events based only on the current state of the chain. Knowing more about past events in the chain never improves our predictions of the future once we know the current state. In the examples above, there are natural relationships to describe how knowing the current state allows us to predict the next state without needing to know more about the past history. But sometimes we need to include more in the “state” to nullify extra history information.

Example 3: When a skier is travelling down a ski slope, the state-space of the model requires knowing the location of skier on the slope. However, the positions alone don’t give full predictive power. If we new the past location as well as the current location, we could use their difference to estimate the speed of the skier and make better predictions where the skier will go next. But, if we include the current position and velocity in the state, then knowing the past won’t improve the predictive power. A state representing position and velocity in three dimensions needs $$3 + 3 = 6$$ dimensions. We’d also probably like to know the forces currently applied to the skier, but those are typically functions of the current location that can accounted for in transition rates. All information about past forces has already been integrated into obtaining the current location and velocity. A better model could also account for body orientation and limb position.

Example 4: To represent the state of a college basketball game, we need to track the positions and velocities of the players and the ball, and then perhaps also rules variables like which half we are in, the time in the half, the number of fouls on each player and the number of time outs each team still has. If player motion is tracked in 2 dimensions, while ball motion is tracked in 3 dimensions, then the state requires $$2 \times 5 \times (2 + 2) + (3 + 3) = 46$$ dimensions plus rules variables.

Markov chain modelling can have a flavor similar lateral-thinking brain-teasers. The smaller the state-space you can use, the easier calculations will be. But, too small a state-space may not be able to represent all the important features of the process. So the choice of Markov chain state-space embodies the aphorism (attributed to both Albert Einstein and Herbert Spencer) “Everything should be as simple as it can be, but not simpler”. The two puzzles that follow exhibit the opportunities and pitfalls of simple state-spaces.

### Myst’s Elevator problem

The first example will suggest some of the advantages of working with a simple state-space. In the 1993 video game, “Myst 3: Exile”, one of the puzzles involved setting the lock combination below to 3-1-3 by pushing 2 different buttons.

1 , 1 , 1

If we push the buttons at random, this becomes a Markov process with $$3^3 = 27$$ states. However, we can analyze the puzzle more efficiently by working in a smaller state-space. For each combination (x,y,z), let the state $$s = (x-y+z)\% 3$$, where the percent sign represents the remainder operation. This state-space has only 3 states, $$s \in \{ 0, 1, 2\}$$. Even more informative, we can convince ourselves with some experimentation or analysis that no matter what order we push the buttons, the state never actual changes (try clicking the buttons above to experiment):

( 1 - 1 + 1 )%3 = 1

Since the initial state is 1 and the desired final state we want to reach is (3-1+3)%3 = 2, the puzzle is unsolvable, no matter how we try to push our buttons.

This kind of formula we’ve used in our trick is sometimes called an “invariant” or a “conserved quantity”. Invariants can be powerful tools, and besides being used in analysis of other games like the 15 puzzle, have applications in knot theory, mechanics, chemistry, and crystalography. In this case, using the invariant simplifies the state-space to a point where the analysis is obvious.

### The river-crossing puzzle

The second puzzle is a variant of the river crossing problems from more than a 1,000 years ago (Alcuin of York, 700’s AD), and illustrates the dangers of over-simplification.

In colonial times, a woman was heading to market in Williamsport with a quail, a sack of magic beans, and a cat. At the Susquehanna river, the there was a dory which she could use to cross to the other bank. The dory was small and could only carry herself and one of her possessions at a time. However, if she leaves the quail and cat alone together, the cat would kill the quail, and if she leaves the quail alone with the sack, it would eat the beans. Can she ferry all her goods to market safely?

The problem is one of finding a good sequence for the woman to load and unload the dory for each crossing of the river. The full nature of the puzzle involves tracking the locations of 5 objects (woman, dory, cat, quail, and sack), so is at-least 5-dimensional. However, we can greatly simplify things by just tracking which side of the river things are on. Each can be in only 1 of the 2 locations after each trip – 2 possibilities for the cat, 2 for the quail, and 2 for the sack, so a total of $$2^3 = 8$$ possible states. We know we start with all of them located on the south side of the river, and finish when we get all of them to the north side of the river.

(Graph of our simplified state-space for river-crossing problem. Nodes are marked with who has crossed the river: (c)at, (q)uail, and (s)ack. Red nodes are ruled out from allowed paths by the puzzle constraints.)

We’ve discovered that only 4 of the possible 8 states are legal under the conditions of the problem, and that there is no sequence of moves allowed through these 4 states that start with all the goods on the south bank and end up with all the goods on the north bank. It appears that it was impossible for the woman to get to market with all her goods.

But we have fooled ourselves by over-simplifying our state-space. Let’s take a step backward and make one more observation. At the beginning of the puzzle, the cat, quail, and sack were all on the same side of the river, but it was okay because the woman was there to keep them apart from each other – the woman’s location matters. However, our initial guess at the state-space did not include the woman’s location – perhaps her location will make a difference.

If we include the woman’s location in addition to the cat, the quail, and the sack of beans, we are considering $$2^4 = 16$$ possible states. In this enlarged state-space, the illegal states are a little different: the cat and quail can not be left alone together (meaning they CAN be together as long as the woman is there also); and the quail and sack can not be left alone together (similarly). Using our physical intuition for the puzzle, we can now fill in which transitions are allowed between which states.

(Graph of expanded state-space for river-crossing problem. Nodes are marked with who has crossed the river: (w)oman, (c)at, (q)uail, and (s)ack. Red nodes are ruled out.)

In this enlarged state-space, there is a sequence of allowed state changes that gets all the goods across the river safely. There are actually two solutions. One of them is this.

1. Ferry the quail across. (reaching state “wq”)
2. Row back. (reaching state “q”)
3. Ferry the cat across. (reaching state “wcq”)
4. Ferry the quail back. (reaching state “c”)
5. Ferry the sack across. (reaching state “wcs”)
6. Row back. (reaching state “cs”)
7. Ferry the quail across. (reaching state “wqcs”)

Our initial error was to over-simplify the state-space. Once we had done that, there was no solution, no matter how hard we looked. When we reconsider our state-space and use a larger but still managable version, our search then discovers a solution. (An algorithmist would point out that the final solution can be discovered with a depth-first search once given an adequate state-space). This is not the only state-space representation of the problem where the solution could can be calculated – even larger state-spaces could work as well. We could, for example, track the dory’s location, or include states where the dory and it’s occupants are in the middle of the river in addition to being on either bank. The downside of the larger state spaces is that they take more time to search.

# Exercises

1. The following examples are to give you some practice with Markov chain calculations. For each transition matrix, draw a directed graph, with nodes labeled for each state and edges labeled with their transition probabilities. Then, try to find the equilibrium distribution $$\tilde{p}(\infty)$$ such that $$\tilde{p}(\infty) = A \tilde{p}(\infty)$$ and discuss the meaning of your result.

1. $A := \begin{bmatrix}\frac{1}{2} & 0 & \frac{1}{3}\\\frac{1}{2} & \frac{1}{2} & 0\\0 & \frac{1}{2} & \frac{2}{3}\end{bmatrix}.$

2. $A := \begin{bmatrix}\frac{1}{2} & 0 & 0 & \frac{2}{3}\\\frac{1}{2} & \frac{1}{2} & 0 & 0\\0 & \frac{1}{2} & \frac{1}{2} & 0\\0 & 0 & \frac{1}{2} & \frac{1}{3}\end{bmatrix}$

3. $A:=\begin{bmatrix}1 & \frac{1}{4} & 0 & 0\\0 & \frac{1}{2} & \frac{1}{4} & 0\\0 & \frac{1}{4} & \frac{1}{2} & 0\\0 & 0 & \frac{1}{4} & 1\end{bmatrix}$

2. (from Breneman) Alan Quartermain has fallen off the side of a mountain while adventuring in South America. He has stopped his fall by grabbing a tree root, but just barely. If his footing slips and he takes one more step backwards, he will fall to his death. He must take 3 steps up the very crumbly face of the cliff from his current spot to reach the top and safety. Each time he tries to take a step, the odds are 2 in 3 that he moves up the cliff towards safety, and 1 in 3 that he moves down the cliff towards his death. What are Alan’s chances of survival from his dire predicament?

3. The breakdown of machines and structures are often modelled using Markov chains. One example of this is the deterioration of bridges. Imagine a new rope bridge in the Andes mountains held up by 3 jut ropes. Every time a person crosses the bridge, there is a chance that one of the ropes will break. Let $$p _ n$$ be the probability that one of the ropes breaks when there are $$n$$ ropes, and assume that two ropes never break at once. For our purposes, $$p _ n = \dfrac{1}{200 n}$$.

1. Draw graph of this Markov chain. Label all nodes with their corresponding state, and all edges with their transition probability. Discuss any uncertainties in the model structure and parameterization.

2. Construct a transition matrix for changes in the bridge’s state every time a person crosses the bridge.

3. If the Markov chain begins when the bridge is first built, what is the initial condition?

4. What is the probability that the bridge will have collapsed by the time 1000 people have crossed it? (Hit: Use python to perform this calculation. The function linalg.matrix_power will be useful. Be sure to explain your answer.)

4. (Jonas, 2014) In a manufacturing setting, Markov chains can be useful for describing throughput for certain processes. One example of this is a repair process at an electronics manufacturer. For this process, that states of the chain are called “stages”. A device enters through “receive” stage and then is moved to the “debug” stage. Depending on the test results from “debug”, it is moves to “scrap” or “repair” stages. From “repair”, it moves to “test”. From the “test” stage, the device will move to either “ship” or back to “debug”. About 40% of the time, the debugging stage fails and the device is scrapped. About 80% of the time, the testing phase succeeds and the device is shipped. The average processing times for each transient stage are 10 minutes in receiving, 1 hour in debugging, 30 minutes in repair, and 15 minutes in testing. (The “scrap” and “ship” stages are absorbing stages, and formally have infinite processing times.)

1. What is the probability that a device in the debug stage eventually returns to the debug stage?

2. What is the expected number of times a device will enter the debug stage? (Hint: the number of returns is geometrically distributed)

3. The total load of a device is the expected number of hours needed to handle each device before it is scrapped or shipped. Calculate the total load under the above model description.

1. For each of the dynamic processes below, determine which aspects of the process you would want to incorporate into a Markov chain model’s state space. For each, give a feature which can be observed and could be included as part of the state-space, but which you feel can be ignored when answering the question at hand. Explain.

1. What’s the typical score distribution in a round of shuffleboard.

2. How likely is a skater to fall during a short-track iceskating race?

3. What’s the chance a student living in a dorm this winter gets the flu?

( previous, home, next )