# Lecture 25 - Preferential attachment

(previous, next)

## Rank-order statistics

Sometimes we are given a set of numbers -- Is that set random or is there some pattern in the numbers that we can make sense of?

Examples of 1-d data sets:

• Frequency of word use in the bible or some corpse
• Number of heads in 50 coin flips
• Sizes of forest fires
• Time between earth-quakes
• Length of baseball games
• number of links to a website

### Ways to see patterns

Rank order statistics - Sort numbers from largest to smallest, and plot the value as a function of rank.

n = len(values)
values.sort().reverse()
rank = range(1,n+1)
subplot(2,1,1)
plot(rank, value, 'bo')
subplot(2,1,2)
loglog(rank, value, 'go')

Empirical CDF: given a sample of n values,

plot value, rank/(n+1)

clustering of values shows up as an inflection point.

Data-sets with scales ---

• All the numbers are less than some value
• After a certain point, numbers decrease exponentially
• Numbers are clustered around some value

Data on actors networks, citation networks, internet links

### Symmetric preferential attachment

Consider an actor network. Every time step, add a new node corresponding to a new actress/actor, with $$m$$ symmetric relationships to other movie stars. We can assume that no node gets attached more than once in each time step, and only 1 new actor is added each step.

Expected numbers: $$N(k,t) =$$ number of nodes of degree $$k$$ at time $$t$$

So, suppose a new movie star picks a connection at random from all the other connections among movie stars. There will be $$2mt$$ connections total out there, and a degree $$k$$-star has $$k$$ connections, so the chance that star is picked by the new star will be $$k/2mt$$. Since the new star will pick $$m$$ connections, and there are $$N(k,t)$$ stars with degree $$k$$, then on-average, $$m (k/2mt) N(k,t)$$ stars of degree $$k$$ will become starts of degree $$k+1$$.

Now, if we apply this to actors of all degree, we expect $$N(k,t)$$ to obey the following difference equation. $\begin{gather} N(k,t+1) = N(k,t) + \frac{ m (k-1)}{2 m t} N(k-1, t-1) - \frac{ m k}{2 m t} N(k, t-1) + 1_{k = m} \end{gather}$ If $$k > m$$, then we can simplify this to the difference equation

$N(k, t+1) - N(k, t) = \frac{(k-1) N(k-1, t)}{2t} - \frac{k N(k,t)}{2 t}.$

If $$k = m$$, then we have the slightly simplier equation

$N(m,t+1) - N(m,t) = 1 - \frac{mN(m,t)}{2t}.$

An ansatz,...

$N(k,t) = \frac{Ct}{k(k+1)(k+2)}$

This works when $$C = 2 m (m+1)$$, and so, we discover the solution

$N(k,t) = \frac{2 m t \left(m + 1\right)}{k \left(k + 1\right) \left(k + 2\right)}$

from sympy import *
from sympy.abc import *
# ansatz for solution
N = lambda k,t : C * t/k/(k+1)/(k+2)

# If the above form is a solution,
# the two equations should simplify to zero.

eq = N(k,t) + m*(k-1)/(2*m*t)*N(k-1,t) - m*k/(2*m*t)*N(k,t) - N(k,t+1)
assert 0 == eq.factor()

eq = N(m,t+1) - N(m,t) - 1 + N(m,t)*m/2/t
N(k,t).subs(C, solve(eq,C).pop())

Sum(2*m*t*(m+1)/k/(k+1)/(k+2),(k,m,oo))

This is a scale free relationship, in the sense that the curve is approximately self-similar for a range of observations -- for large $$k$$, $$N(k,t) \sim k^{-3}$$.

## Problems:

• This is a deterministic model -- the growth process is stochastic
• Internet links are directed, not mutual