Consider the following patristocrat cipher.
lwikhqqhmixwduwhltdbqlquhbuigohpcdllipvqavduzgombi
xyhbbnxibhypgnxibxixibihgcnqlriylpnvrgioilqldggxik
ibqdqlipdychzdymnvbxholwbnvmwdlhlgiymlwxiiylibiphx
dpibqkhuihyplwignxbnntwidmwliyiprvlhqxiunymbhlvghl
ipnvbqigeiqnylwdquwhyminvblnbuwxhqijldymvdqwiprohu
vbbiylnthdbhypxixibigitldyvllibphbzyiqqlwimvdpiqrb
dymxdlwlwicchlibdhgqtnbbiyixdymlwigdmwlrvlxiwhpyny
invbnygobiqnvbuixhqlnbilvbyhqxiuhciximbnkipbnvyplw
ixdpiyipqkhuilntdyplwiiylbhyuihyphtlibhldcithyudip
lwhlxiwhpqvuuiipiplwdqkbneipwnxieiblnrihqiunypkhqq
hmixwduwiedpiylgohquiypipdllibcdyhlipgdzilwitnbcib
lwnvmwqncilwdymhkkbnhuwdymlnhbhoxiunvgpynlliggxwiy
uiqwipheibopnvrltvglxdgdmwldylwiqkhuiropimbiiqnvbi
oiqmbixqncixwhlhuuvqlnciplnlwdqpdcyiqqhypxikibuide
iplwhllwibixhqynpdbiulkhqqhmigihpdymvqtvblwibrvllw
hldlxhqknqqdrgilnugdcrnyiqdpintlwiuheibylnhgnxhbuw
hllnkxwduwkbncdqiphcnbiihqokhlwtbncxwiyuixiynxpdqu
neibiplwhllwdqgdmwlkbnuiipipxdlwunyqdpibhrgipdttdu
vgloxiqubhcrgipvkhypuhcilnhynlwibkhqqhmixdlwqldggc
nbintdggvcdyhldnyhyplwdqgipln
We know immediately that this is not English – For example, there is no word that contains “hqqh” or “whlt”. Decrypting this message is hard if we aren’t smart about it. Every letter has been switched with some other letter, meaning that there are \(26! \gg 10^{26}\) different possibilities to sort through – far too many even for computers. But there are good ideas we can use to reduce the solutionspace we need to explore – Edgar Poe lays many them out nicely in his entertaining story The GoldBug. For example, different letters appear with different frequencies in the English language – vowels appear in every word, while consonants like j and z are used infrequently, the letter q is always followed by the letter u, and so on.
One of the challenges in writting a computer program to test solutions is scoring how close the resulting text is to meaningful English – while we ourselves are pretty good at telling when something is written in English and when it isn’t, it is not immediately clear how to transform that knowledge into an algorithm. A Markov chain model can fix this – while the letters in Engllish do NOT obey a Markov chain, they can be described as such.
If \(f(x)\) is a bijection that maps each letter to its true value, and letters appear according to a Markov chain, the likelihood of \(f\) is \[\ln \mathcal{L}(f) = \sum _ {i=1} ^ {n1} \ln A(f(s_i),f(s_{i+1}))\] where \(A(i,j)\) is the probability that letter \(i\) will follow letter \(j\). We can estimate \(A(i,j)\) by examining a large corpsus of works or even just a single book.
With some (or much) work, you may discover the following.
The passage, which at first scarcely admitted us, quickly grew narrower and lower; we were almost bent double; yet still we persisted in making our way through it. At length we entered a wider space, and the low roof heightened; but, as we congratulated ourselves on this change, our torch was extinguished by a current of air, and we were left in utter darkness. The guides bring with them materials for renewing the light, but we had none–our only resource was to return as we came. We groped round the widened space to find the entrance, and after a time fancied that we had succeeded. This proved however to be a second passage, which evidently ascended. It terminated like the former; though something approaching to a ray, we could not tell whence, shed a very doubtful twilight in the space. By degrees, our eyes grew somewhat accustomed to this dimness, and we perceived that there was no direct passage leading us further; but that it was possible to climb one side of the cavern to a low arch at top, which promised a more easy path, from whence we now discovered that this light proceeded. With considerable difficulty we scrambled up, and came to another passage with still more of illumination, and this led to another ascent like the former.
Of course, that still leaves the mystery of where it is from and what it means ;)
While probability distributions can be very useful models, they don’t help us describe or analyze how a system changes over time. For that, we need to create new tools. One of the most useful is the Markov chain. A Markov chain is a special kind of stochastic process than can describe systems changing randomly over time. A Markov chain consists of a set of “states”, and a rule for each state that gives the chances for which state will come next. The change from one state to another is called a “transition”. Each rule determining the transition from one state to the next is described by a probability distribution called the “transition probabilities”.
The childhood board game Chutes and ladders can be described as a Markov chain. Recall that the rules of the game are as follows. The game is played on a 10x10 board, with the squares numbered 1 to 100. A player has a piece to mark their current spot on the board. In each turn, you roll the dice, and depending your roll and your current spot, you count forward to see where you land next, including any chutes or ladders you land on at the end of your turn. Your ‘state’ in the game is your piece’s current spot. The allowed transitions are all the possible spots you might land next from your current spot. And the transition probabilities are the probabilities of each needed dice roll.
The key property of a Markov chain is that once you know the current state, extra information about the past history of system does not provide any additional information for predicting the future states – all useful information is captured in the current state. Any stochastic processes with this property is called “Markovian”. Some stochastic processes are not Markovian. For example, a marble bouncing between different numbers on a roulette wheel does not form a Markov chain because the chances of different outcomes depends not just on the marble’s current number, but also on if this is the first number the marble hits and so is moving fast, or if it’s a later number and getting close to settling.
After this first pass, the dependence only on the current state seems like a serious restriction. Markov chains seem like they might be useful only occasionally – lots of things that we might like to model have history dependence in some form. The roulette wheel marble’s speed, which depends on how many times it has already bounced, or in baseball, where the outcome of a pitch depends not just on how it is thrown but also on how many balls and strikes the batter has accumulated in the past. BUT, as we will see, this is not as big a hurdle as it first appears.
In Oz, the weather switches among 3 states every day… Sunny, Rainy, or Snowy.
The weather department observed the following daily history in Oz over 9 days
0 Sunny
1 Sunny
2 Sunny
3 Sunny
4 Rainy
5 Snowy
6 Snowy
7 Rainy
8 Snowy
9 Sunny
...
The longer history in compressed using (s) for sunny, (w) for snowy, and (r) for rainy, over 1600 days is
sssrwwrwssssrwwwrwwrssswwrssssswwrswrwrssswswrwwrsrwrswsrswssswwssrwwrsssssrwrss
srwwrswssssrwwwwsrwwsswssrssrsssswrsrwwwrsrsrwrsrwsrsssssrwwssrwsswsswrwswswwwsw
wwrssssrssswssssswwwwwwwrssswwwwssssswrwwwwwwwswswrwwswrwwrsrwrswsswrsrwwwwwwwrs
wwwrswswwrssssswrwswssssrssssrsrwrsssssrssrwrwwrwwwwwwsrwssrwwwwrsssrsswsssrssww
wswswrssssswwrssrssrwwrswwwwwwwssssrswrsssrwwsssrwwrsrwsrwswwwsswwwwwwwrssswrwsr
srswrswsswwssssssrwswrwswwwwwrswsssrwwwwwwwwsrwrwsrwrswswwwwrwwwwwrswrswsssssrww
rwsrsssssrssrsrswwwrwsswwwrsssssswwrwsrwwwwrwwwrssrwrwwssrssswswrwwwsswrsrsssrwr
swrssrsrswssrwwrswrwrwwswsrwssssrsssswwsssrswwrsswrwrssrswrwwwrwssrssrsrsswwwrww
wsrwwwsswswwwwsswsssrsrsrswwwwwwrssssrwrwsswsswsrwrssssrsrwwswrsrssrssssswswrssw
rsrssrwwwwrswwrswrswwwrwrssrwwwsrsswwssrswwwwwwwwrwwwsswswsrwrssswrwwsrwwsrwswrs
rwrswwrswsrsrsswwwwswwssrssssrwsrsswssswssssrwssrsswwssrsssswwwrswwsrwwsrsrsswwr
wrssswwsrssrssrsrwwsrwwsrwwwsrsrsrwrswwrwwsswrsrssssrsrwwwssswwwwwsrwwwssrsswwwr
ssssrwwrwwrwrsswwwwwwwssrwwwwwrswrssswsrwwrsswwrwwwwwrwrwswwrsrwwrwrsrsrssrwsrsr
sswwswwsssssssrwwwwssswwwrsrwsswsswrssrssrwrsssrssssrwsrwsrswrswsrwsrssssswrwsws
srswssswrswwsswwrssrsssswwrwsssssrsssrswsrsrwwrwwwwrwsrssrwwrwrwsssrwswsssrwssrw
wrwrwswwwrwrwwwwwwwssswwsrsswrsrsswsrssrwrwwwwsssrwwwrwwwrssssswrssswrssrwwsrsrs
wrssswsrswwsrwrwwwrsswwrssssrwwwwwrwwwwrwwwwwwrsrsssrwrwswswrwwwwrsrsrwrwrsswsrw
wsssssswwwswwwwwwwrsswwwwrswwwrssrwswrwsrwwwsrwwwwsrssrwswswwsrwswwwwsswrsssswww
swrsswsswsssrwwsrwwwrwssrssssrsssssrsssrwrswrsrwwsswrwrwwwwswwrwwssswsrsswsssrww
wwwwwwwwrwwsrssrwwsrwrssssswrssssswwswwwrssssswsswwwwsssswwssswwrswwssssswwwrsrs
The first step in analyzing data we want to model as a Markov chain is to count the transitions to see what occurs and how often.
data = "sssrwwrwssssrwwwrwwrssswwrssssswwrswrwrssswswrwwrsrwrswsrswssswwssrwwrsssssrwrsssrwwrswssssrwwwwsrwwsswssrssrsssswrsrwwwrsrsrwrsrwsrsssssrwwssrwsswsswrwswswwwswwwrssssrssswssssswwwwwwwrssswwwwssssswrwwwwwwwswswrwwswrwwrsrwrswsswrsrwwwwwwwrswwwrswswwrssssswrwswssssrssssrsrwrsssssrssrwrwwrwwwwwwsrwssrwwwwrsssrsswsssrsswwwswswrssssswwrssrssrwwrswwwwwwwssssrswrsssrwwsssrwwrsrwsrwswwwsswwwwwwwrssswrwsrsrswrswsswwssssssrwswrwswwwwwrswsssrwwwwwwwwsrwrwsrwrswswwwwrwwwwwrswrswsssssrwwrwsrsssssrssrsrswwwrwsswwwrsssssswwrwsrwwwwrwwwrssrwrwwssrssswswrwwwsswrsrsssrwrswrssrsrswssrwwrswrwrwwswsrwssssrsssswwsssrswwrsswrwrssrswrwwwrwssrssrsrsswwwrwwwsrwwwsswswwwwsswsssrsrsrswwwwwwrssssrwrwsswsswsrwrssssrsrwwswrsrssrssssswswrsswrsrssrwwwwrswwrswrswwwrwrssrwwwsrsswwssrswwwwwwwwrwwwsswswsrwrssswrwwsrwwsrwswrsrwrswwrswsrsrsswwwwswwssrssssrwsrsswssswssssrwssrsswwssrsssswwwrswwsrwwsrsrsswwrwrssswwsrssrssrsrwwsrwwsrwwwsrsrsrwrswwrwwsswrsrssssrsrwwwssswwwwwsrwwwssrsswwwrssssrwwrwwrwrsswwwwwwwssrwwwwwrswrssswsrwwrsswwrwwwwwrwrwswwrsrwwrwrsrsrssrwsrsrsswwswwsssssssrwwwwssswwwrsrwsswsswrssrssrwrsssrssssrwsrwsrswrswsrwsrssssswrwswssrswssswrswwsswwrssrsssswwrwsssssrsssrswsrsrwwrwwwwrwsrssrwwrwrwsssrwswsssrwssrwwrwrwswwwrwrwwwwwwwssswwsrsswrsrsswsrssrwrwwwwsssrwwwrwwwrssssswrssswrssrwwsrsrswrssswsrswwsrwrwwwrsswwrssssrwwwwwrwwwwrwwwwwwrsrsssrwrwswswrwwwwrsrsrwrwrsswsrwwsssssswwwswwwwwwwrsswwwwrswwwrssrwswrwsrwwwsrwwwwsrssrwswswwsrwswwwwsswrsssswwwswrsswsswsssrwwsrwwwrwssrssssrsssssrsssrwrswrsrwwsswrwrwwwwswwrwwssswsrsswsssrwwwwwwwwwwrwwsrssrwwsrwrssssswrssssswwswwwrssssswsswwwwsssswwssswwrswwssssswwwrsrs"
states = [ 's', 'r', 'w' ]
pairs = zip(data[:1], data[1:])
counts = {}
for i in states:
for j in states:
counts[(i,j)] = pairs.count( (i,j) )
print i,j, counts[(i,j)]
from pylab import *
C = array([[ counts[(i,j)] for j in states ] for i in states])
S = diag(ones(3).dot(C)**1)
A = C.dot(S)
print C
print S
print A
Counts of pairs  Sunny today  Rainy today  Snowy today 

Sunny tomorrow 

172  160 
Rainy tomorrow 


152 
Snowy tomorrow 

156  306 
One of the things we see in inspecting this is that it never rains two days in a row, although it may snow or be sunny for many days in a row.
At this point, our counts are equally valid reading the weather forward or backward. But since we like to be able to think of time as flowing forward, its useful to renormalize the data by dividing each column by its total to get approximately the chance of each transition, given the current state. Based on the Normal approximation of the binomial distribution, we can not expect get an accuracy of more than \(1.5/\sqrt{600} \approx 1/16\). If we round to the nearest sixteenth (making sure columns sum to 1), the transition probababilities come out as follows.
Prob( x > y )  Sunny today  Rainy today  Snowy today 

Sunny tomorrow  1/2  1/2  1/4 
Rainy tomorrow  1/4 

1/4 
Snowy tomorrow  1/4  1/2  1/2 
Notice that the columns now sum to 1. In every proper Markov chain, the columns of the transition matrix always sum to 1. We can also represent the transitions between states with a labelled directed graph.
We can simulate this chain with an 8sided die, or computer simulations with a very simple python algorithm.
#!/usr/bin/env python2
from scipy import rand
s = 'S'
r = 'R'
w = 'W'
transitions = {s: (s,s,r,w), r:(s,s,w,w), w:(s,r,w,w)}
states = transitions.keys()
def rndStep(state):
return transitions[state][int(rand()*4)]
def rndPath(t, x, tmax):
# x is the initial state
# t is the initial time
# tmax is the time to stop at
while t < tmax:
yield t, x
t, x = t + 1, rndStep(x)
def main():
n = 80*20
path = [x for t, x in rndPath(0,s,n)]
print ''.join(path)
main()
This is not a very general method (though it is fast) of simulating from a Markov chain described with a transition matrix. Better methods like aliasing exist.
If it is sunny today, what is the chance that it will snow in 3 days? (we don’t care about the weather for the next 2 days)
Well, there are several paths the weather might take between now and then. If we can list all the paths, calculate the probability of each path and then sum up all the paths, that would give us the probability of snow in 3 days.
Paths  Probability 

SSSW  1/16 
SSRW  1/16 
SSWW  1/16 
SRSW  1/32 
SRRW  0 
SRWW  1/16 
SWSW  1/64 
SWRW  1/32 
SWWW  1/16 
So the probability of snow in 3 days is \((4+4+4+2+0+4+1+2+4)/64 = 25/64\), a little better than 1 in 3. This kind of calculation is a special case of what’s popularly called a “path integral”. Path integrals are a generalization of calculus introduced by famous physicist Richard Feynman to help with calculations in quantum mechanics.
Doing a path integral this way is rather tedious – there are many paths to check and it turns out there’s a faster way. We just have to raise the matrix \(A\) to the appropriate power. If \(\vec{p}(0)\) is the vector of probabilities of starting in each state at step 0, and \(\vec{p}(t)\) is the probabilities of ending in each state in step \(t\),
\[\vec{p}(t) = A^t \vec{p}(0)\]
By raising the matrix \(A\) to the third power, we can calculate the probabilities of ending in each state, depending on the starting state.
\[A^3 = \left[\begin{matrix}\frac{13}{32} & \frac{13}{32} & \frac{25}{64}\\\frac{13}{64} & \frac{3}{16} & \frac{13}{64}\\\frac{25}{64} & \frac{13}{32} & \frac{13}{32}\end{matrix}\right]\]
It’s sunny today, so \(p(0)=[1,0,0]^T\) and so
\[p(3)=A^3 p(0)=\left[\begin{matrix}\frac{13}{32} & \frac{13}{32} & \frac{25}{64}\\\frac{13}{64} & \frac{3}{16} & \frac{13}{64}\\\frac{25}{64} & \frac{13}{32} & \frac{13}{32}\end{matrix}\right]\left[\begin{matrix}1\\0\\0\end{matrix}\right]=\left[\begin{matrix}\frac{13}{32}\\\frac{13}{64}\\\frac{25}{64}\end{matrix}\right].\]
Therefore, three days from now it will be sunny with probability 13/32, rainy with probability 13/64, and snowy with probability 25/64.
Some Markov chains, when run long enough into the future, forget their initial state and approach a stationary distribution. These kinds of Markov chains are called ergodic. The simplest test for ergoticity is to see if the chain can every get trapped in a subset of statespace. If you can always move from each state to each another by some path, then the chain is ergodic.
For an ergodic Markov chain, the steadystate distribution \(p(\infty)\) must solve the equation
\[p(\infty) = A p(\infty).\]
This also happens to be the eigenvector equation for eigenvalue \(\lambda = 1\). Since the columns of \(A\) sum to \(1\), there must be an eigenvalue equal to \(1\).
(has simple eigenvalues and steadystate) The characteristic polynomial \(z^{3}  z^{2} {z}/{16} + {1}/{16} = 0\) factors to \[\left(z  1\right) \left(4 z  1\right) \left(4 z + 1\right) = 0\]
Calculate equilibrium distribution using reduced rowechelon form. (\(IA\) will be singular! But we also want \(p_1 + p_2 + p _ 2 = 1\).)
\[p(\infty) = \frac{1}{5} \begin{bmatrix}2\\1\\2\end{bmatrix}\]
So, after forgetting the initial condition, we expect it to be sunny \(2/5\)th of the time, snowy \(2/5\)th of the time, and rainy only \(1/5\)th of the time.
(There is actually a bit more work here that I’m glossing over. Specifically, how can I be sure that \(\lim _ {n\rightarrow\infty} A^n p(0)\) doesn’t blow up? It’s because the eigenvalues of \(A\) have to be less than 1, which we establish via the the PerronFrobenius, since the eigenvalue 1 is the “Perron root.”)
Modelling with Markov chains has two parts. (1) specifying a set of states, called a “statespace”, for the process of interest, and (2) estimating transition probabilities among the states. Determining the transition probabilities, though not always easy, is a statistical inference problem which we will leave to other studies. The art of Markov chain modelling comes in choosing a good statespace.
Whenever we use language to talk about nature, we’re forced to “tokenize” our observations – to assign one of a relatively small number of discrete states of being to represent a moment and place in reality. For example, in our modeling of the weather in Oz, our data described every day’s weather with one of three states : rain, snow, or sun. Nature is seldom this tidy. Oz’s meteorologist made no comment about days where it started as rainy and ended up sunny, or days when it was cloudy but not rainy or snowy. As a general rule, we’d like to use as few states as possible, while still capturing all the important features of the natural process, but the more precise an answer you want, the more state’s you’ll probably want to use.
Example 1: A onewindow icecream parlor serves customers 1 at a time. To model this simple service process, we need a statespace that should atleast include the current length of the queue and the current order being filled. If we associate each order with an integer, then the statespace can be modelled by two natural numbers, both of which can be given upper bounds in practice.
Example 2: The shuffling of a deck of cards is an important process in gambling. In a standard deck, there are 52 cards, each of which can have 52 positions. So a naive representation of the state of a deck of cards might be to use 52 natural numbers, one for each position in the deck, with the number representing which of the 52 cards is in that position. Thus, we would allow for \(52^{52} \approx 1.7 \times 10^{89}\) states. But not all of these states are realistic – no card can appear more than once in the deck, for example. There are only \(52! \approx 8\times 10^{67}\) actual permutations of a deck of cards. It would be nice if we could find a way to represent just the realistic states as simply as the 52 integers we’ve already suggested (see Diaconis’s work for further discussion). In practice, it is quite common to use representations that included unrealistic states, and we just have to be careful to avoid these states when construction our transitions. Interestingly, you could represent the deck’s state the opposite way also – have one number for each card, and each number would represent the position of that card in the deck. Both styles work.
Markov chains make a major assumption. Markov chains predict of future events based only on the current state of the chain. Knowing more about past events in the chain never improves our predictions of the future once we know the current state. In the examples above, there are natural relationships to describe how knowing the current state allows us to predict the next state without needing to know more about the past history. But sometimes we need to include more in the “state” to nullify extra history information.
Example 3: When a skier is travelling down a ski slope, the statespace of the model requires knowing the location of skier on the slope. However, the positions alone don’t give full predictive power. If we new the past location as well as the current location, we could use their difference to estimate the speed of the skier and make better predictions where the skier will go next. But, if we include the current position and velocity in the state, then knowing the past won’t improve the predictive power. A state representing position and velocity in three dimensions needs \(3 + 3 = 6\) dimensions. We’d also probably like to know the forces currently applied to the skier, but those are typically functions of the current location that can accounted for in transition rates. All information about past forces has already been integrated into obtaining the current location and velocity. A better model could also account for body orientation and limb position.
Example 4: To represent the state of a college basketball game, we need to track the positions and velocities of the players and the ball, and then perhaps also rules variables like which half we are in, the time in the half, the number of fouls on each player and the number of time outs each team still has. If player motion is tracked in 2 dimensions, while ball motion is tracked in 3 dimensions, then the state requires \(2 \times 5 \times (2 + 2) + (3 + 3) = 46\) dimensions plus rules variables.
Markov chain modelling can have a flavor similar lateralthinking brainteasers. The smaller the statespace you can use, the easier calculations will be. But, too small a statespace may not be able to represent all the important features of the process. So the choice of Markov chain statespace embodies the aphorism (attributed to both Albert Einstein and Herbert Spencer) “Everything should be as simple as it can be, but not simpler”. The two puzzles that follow exhibit the opportunities and pitfalls of simple statespaces.
The first example will suggest some of the advantages of working with a simple statespace. In the 1993 video game, “Myst 3: Exile”, one of the puzzles involved setting the lock combination below to 313 by pushing 2 different buttons.
If we push the buttons at random, this becomes a Markov process with \(3^3 = 27\) states. However, we can analyze the puzzle more efficiently by working in a smaller statespace. For each combination (x,y,z), let the state \(s = (xy+z)\% 3\), where the percent sign represents the remainder operation. This statespace has only 3 states, \(s \in \{ 0, 1, 2\}\). Even more informative, we can convince ourselves with some experimentation or analysis that no matter what order we push the buttons, the state never actual changes (try clicking the buttons above to experiment):
Since the initial state is 1 and the desired final state we want to reach is (31+3)%3 = 2, the puzzle is unsolvable, no matter how we try to push our buttons.
This kind of formula we’ve used in our trick is sometimes called an “invariant” or a “conserved quantity”. Invariants can be powerful tools, and besides being used in analysis of other games like the 15 puzzle, have applications in knot theory, mechanics, chemistry, and crystalography.
The second puzzle is a variant of the river crossing problems from more than a 1,000 years ago (Alcuin of York, 700’s AD), and illustrates the dangers of oversimplification.
In colonial times, a woman was heading to market in Williamsport with a quail, a sack of magic beans, and a cat. At the Susquehanna river, the regular ferry is out, but there was a dory which she could use to cross to the north bank. The dory was small and could only carry herself and one of her possessions at a time. However, if she leaves the quail and cat alone together, the cat would kill the quail, and if she leaves the quail alone with the sack, it would eat the beans. Can she ferry all her goods to market safely?
The problem is one of finding a good sequence for the woman to load and unload the dory for each crossing of the river. The full nature of the puzzle involves tracking the locations of 5 objects (woman, dory, cat, quail, and sack), so is atleast 5dimensional. However, we can greatly simplify things by observing that it is probably sufficient to just track which side of the river things are on. Each can be in only 1 of the 2 locations after each trip – 2 possibilities for the cat, 2 for the quail, and 2 for the sack, so a total of \(2^3 = 8\) possible states. We know we start with all of them located on the south side of the river, and finish when we get all of them to the north side of the river.
(Graph of naive statespace for rivercrossing problem. Nodes are marked with who has crossed the river: (c)at, (q)uail, and (s)ack. Red nodes are ruled out from allowed paths by the puzzle constraints.)
We’ve discovered that only 4 of the possible 8 states are legal under the conditions of the problem, and that there is no sequence of moves allowed through these 4 states that start with all the goods on the south bank and end up with all the goods on the north bank. It appears that it was impossible for the woman to get to market with all her goods.
But we have fooled ourselves by oversimplifying our statespace. Let’s take a step backward and make one simple but easily overlooked observation. At the beginning of the puzzle, the cat, quail, and sack were all on the same side of the river, but it was okay because the woman was there to keep them apart from each other – the woman’s location matters. However, our initial guess at the statespace did not include the woman’s location – perhaps her location will make a difference.
If we include the woman’s location in addition to the cat, the quail, and the sack of beans, we are considering \(2^4 = 16\) possible states. In this enlarged statespace, the illegal states are a little different: the cat and quail can not be left alone together (meaning they CAN be together as long as the woman is there also); and the quail and sack can not be left alone together (similarly). Using our physical intuition for the puzzle, we can now fill in which transitions are allowed between which states.
(Graph of expanded statespace for rivercrossing problem. Nodes are marked with who has crossed the river: (w)oman, (c)at, (q)uail, and (s)ack. Red nodes are ruled out.)
In this enlarged statespace, there is a sequence of allowed state changes that gets all the goods across the river safely. There are actually two solutions. One of them is this.
Our initial error was to oversimplify the statespace. Once we had done that, there was no solution, no matter how hard we looked. When we reconsider our statespace and use a larger but still managable version, our search then discovers a solution. (An algorithmist would point out that the final solution can be discovered with a depthfirst search once given an adequate statespace). This is not the only statespace representation of the problem where the solution could can be calculated – even larger statespaces could work as well. We could, for example, track the dory’s location, or include states where the dory and it’s occupants are in the middle of the river in addition to being on either bank. The downside of the larger state spaces is that they take more time to search.
The following examples are to give you some practice with Markov chain calculations. For each transition matrix, draw a directed graph, with nodes labeled for each state and edges labeled with their transition probabilities. Then, try to find the equilibrium distribution \(\tilde{p}(\infty)\) such that \(\tilde{p}(\infty) = A \tilde{p}(\infty)\) and discuss the meaning of your result.
\[A := \begin{bmatrix}\frac{1}{2} & 0 & \frac{1}{3}\\\frac{1}{2} & \frac{1}{2} & 0\\0 & \frac{1}{2} & \frac{2}{3}\end{bmatrix}.\]
\[A := \begin{bmatrix}\frac{1}{2} & 0 & 0 & \frac{2}{3}\\\frac{1}{2} & \frac{1}{2} & 0 & 0\\0 & \frac{1}{2} & \frac{1}{2} & 0\\0 & 0 & \frac{1}{2} & \frac{1}{3}\end{bmatrix}\]
\[A:=\begin{bmatrix}1 & \frac{1}{4} & 0 & 0\\0 & \frac{1}{2} & \frac{1}{4} & 0\\0 & \frac{1}{4} & \frac{1}{2} & 0\\0 & 0 & \frac{1}{4} & 1\end{bmatrix}\]
(from Breneman) Alan Quartermain has fallen off the side of a mountain while adventuring in South America. He has stopped his fall by grabbing a tree root, but just barely. If his footing slips and he takes one more step backwards, he will fall to his death. He must take 3 steps up the very crumbly face of the cliff from his current spot to reach the top and safety. Each time he tries to take a step, the odds are 2 in 3 that he moves up the cliff towards safety, and 1 in 3 that he moves down the cliff towards his death. What are Alan’s chances of survival from his dire predicament?
The breakdown of machines and structures are often modelled using Markov chains. One example of this is the deterioration of bridges. Imagine a new rope bridge in the Andes mountains held up by 3 jut ropes. Every time a person crosses the bridge, there is a chance that one of the ropes will break. Let \(p _ n\) be the probability that one of the ropes breaks when there are \(n\) ropes, and assume that two ropes never break at once. For our purposes, \(p _ n = \dfrac{1}{200 n}\).
Draw graph of this Markov chain. Label all nodes with their corresponding state, and all edges with their transition probability. Discuss any uncertainties in the model structure and parameterization.
Construct a transition matrix for changes in the bridge’s state every time a person crosses the bridge.
If the Markov chain begins when the bridge is first built, what is the initial condition?
What is the probability that the bridge will have collapsed by the time 1000 people have crossed it? (Hit: Use python to perform this calculation. The function linalg.matrix_power will be useful. Be sure to explain your answer.)
(Jonas, 2014) In a manufacturing setting, Markov chains can be useful for describing throughput for certain processes. One example of this is a repair process at an electronics manufacturer. For this process, that states of the chain are called “stages”. A device enters through “receive” stage and then is moved to the “debug” stage. Depending on the test results from “debug”, it is moves to “scrap” or “repair” stages. From “repair”, it moves to “test”. From the “test” stage, the device will move to either “ship” or back to “debug”. About 40% of the time, the debugging stage fails and the device is scrapped. About 80% of the time, the testing phase succeeds and the device is shipped. The average processing times for each transient stage are 10 minutes in receiving, 1 hour in debugging, 30 minutes in repair, and 15 minutes in testing. (The “scrap” and “ship” stages are absorbing stages, and formally have infinite processing times.)
What is the probability that a device in the debug stage eventually returns to the debug stage?
What is the expected number of times a device will enter the debug stage? (Hint: the number of returns is geometrically distributed)
The total load of a device is the expected number of hours needed to handle each device before it is scrapped or shipped. Calculate the total load under the above model description.
For each of the dynamic processes below, determine which aspects of the process you would want to incorporate into a Markov chain model’s state space. For each, give a feature which can be observed and could be included as part of the statespace, but which you feel can be ignored when answering the question at hand. Explain.
What’s the typical score distribution in a round of shuffleboard.
How likely is a skater to fall during a shorttrack iceskating race?
What’s the chance a student living in a dorm this winter gets the flu?