- Preferential attachment
- Zipf's law
- Universality
- Application of Zipf's law to newsgroups

Sometimes we are given a set of numbers -- Is that set random or is there some pattern in the numbers that we can make sense of?

Examples of 1-d data sets:

- Frequency of word use in the bible or some corpse
- Number of heads in 50 coin flips
- Sizes of forest fires
- Time between earth-quakes
- Length of baseball games
- number of links to a website

Rank order statistics - Sort numbers from largest to smallest, and plot the value as a function of rank.

```
n = len(values)
values.sort().reverse()
rank = range(1,n+1)
subplot(2,1,1)
plot(rank, value, 'bo')
subplot(2,1,2)
loglog(rank, value, 'go')
```

Empirical CDF: given a sample of n values,

`plot value, rank/(n+1)`

clustering of values shows up as an inflection point.

Data-sets with scales ---

- All the numbers are less than some value
- After a certain point, numbers decrease exponentially
- Numbers are clustered around some value

Data on actors networks, citation networks, internet links

Consider an actor network. Every time step, add a new node corresponding to a new actress/actor, with \(m\) symmetric relationships to other movie stars. We can assume that no node gets attached more than once in each time step, and only 1 new actor is added each step.

Expected numbers: \(N(k,t) =\) number of nodes of degree \(k\) at time \(t\)

So, suppose a new movie star picks a connection at random from all the other connections among movie stars. There will be \(2mt\) connections total out there, and a degree \(k\)-star has \(k\) connections, so the chance that star is picked by the new star will be \(k/2mt\). Since the new star will pick \(m\) connections, and there are \(N(k,t)\) stars with degree \(k\), then on-average, \(m (k/2mt) N(k,t)\) stars of degree \(k\) will become starts of degree \(k+1\).

Now, if we apply this to actors of all degree, we expect \(N(k,t)\) to obey the following difference equation. \[\begin{gather} N(k,t+1) = N(k,t) + \frac{ m (k-1)}{2 m t} N(k-1, t-1) - \frac{ m k}{2 m t} N(k, t-1) + 1_{k = m} \end{gather}\] If \(k > m\), then we can simplify this to the difference equation

\[N(k, t+1) - N(k, t) = \frac{(k-1) N(k-1, t)}{2t} - \frac{k N(k,t)}{2 t}.\]

If \(k = m\), then we have the slightly simplier equation

\[N(m,t+1) - N(m,t) = 1 - \frac{mN(m,t)}{2t}.\]

An ansatz,...

\[N(k,t) = \frac{Ct}{k(k+1)(k+2)}\]

This works when \(C = 2 m (m+1)\), and so, we discover the solution

\[N(k,t) = \frac{2 m t \left(m + 1\right)}{k \left(k + 1\right) \left(k + 2\right)}\]

```
from sympy import *
from sympy.abc import *
# ansatz for solution
N = lambda k,t : C * t/k/(k+1)/(k+2)
# If the above form is a solution,
# the two equations should simplify to zero.
eq = N(k,t) + m*(k-1)/(2*m*t)*N(k-1,t) - m*k/(2*m*t)*N(k,t) - N(k,t+1)
assert 0 == eq.factor()
eq = N(m,t+1) - N(m,t) - 1 + N(m,t)*m/2/t
N(k,t).subs(C, solve(eq,C).pop())
Sum(2*m*t*(m+1)/k/(k+1)/(k+2),(k,m,oo))
```

This is a *scale free* relationship, in the sense that the curve is approximately self-similar for a range of observations -- for large \(k\), \(N(k,t) \sim k^{-3}\).

- This is a deterministic model -- the growth process is stochastic
- Internet links are directed, not mutual