Lecture 25 - Preferential attachment

(previous, next)

References for further reading.

Rank-order statistics

Sometimes we are given a set of numbers -- Is that set random or is there some pattern in the numbers that we can make sense of?

Examples of 1-d data sets:

Ways to see patterns

Rank order statistics - Sort numbers from largest to smallest, and plot the value as a function of rank.

n = len(values)
rank = range(1,n+1)
plot(rank, value, 'bo')
loglog(rank, value, 'go')

Empirical CDF: given a sample of n values,

plot value, rank/(n+1)

clustering of values shows up as an inflection point.

Data-sets with scales ---

Data on actors networks, citation networks, internet links

Symmetric preferential attachment

Consider an actor network. Every time step, add a new node corresponding to a new actress/actor, with \(m\) symmetric relationships to other movie stars. We can assume that no node gets attached more than once in each time step, and only 1 new actor is added each step.

Expected numbers: \(N(k,t) =\) number of nodes of degree \(k\) at time \(t\)

So, suppose a new movie star picks a connection at random from all the other connections among movie stars. There will be \(2mt\) connections total out there, and a degree \(k\)-star has \(k\) connections, so the chance that star is picked by the new star will be \(k/2mt\). Since the new star will pick \(m\) connections, and there are \(N(k,t)\) stars with degree \(k\), then on-average, \(m (k/2mt) N(k,t)\) stars of degree \(k\) will become starts of degree \(k+1\).

Now, if we apply this to actors of all degree, we expect \(N(k,t)\) to obey the following difference equation. \[\begin{gather} N(k,t+1) = N(k,t) + \frac{ m (k-1)}{2 m t} N(k-1, t-1) - \frac{ m k}{2 m t} N(k, t-1) + 1_{k = m} \end{gather}\] If \(k > m\), then we can simplify this to the difference equation

\[N(k, t+1) - N(k, t) = \frac{(k-1) N(k-1, t)}{2t} - \frac{k N(k,t)}{2 t}.\]

If \(k = m\), then we have the slightly simplier equation

\[N(m,t+1) - N(m,t) = 1 - \frac{mN(m,t)}{2t}.\]

An ansatz,...

\[N(k,t) = \frac{Ct}{k(k+1)(k+2)}\]

This works when \(C = 2 m (m+1)\), and so, we discover the solution

\[N(k,t) = \frac{2 m t \left(m + 1\right)}{k \left(k + 1\right) \left(k + 2\right)}\]

from sympy import *
from sympy.abc import *
# ansatz for solution
N = lambda k,t : C * t/k/(k+1)/(k+2)

# If the above form is a solution, 
# the two equations should simplify to zero.

eq = N(k,t) + m*(k-1)/(2*m*t)*N(k-1,t) - m*k/(2*m*t)*N(k,t) - N(k,t+1)
assert 0 == eq.factor()

eq = N(m,t+1) - N(m,t) - 1 + N(m,t)*m/2/t
N(k,t).subs(C, solve(eq,C).pop())


This is a scale free relationship, in the sense that the curve is approximately self-similar for a range of observations -- for large \(k\), \(N(k,t) \sim k^{-3}\).