Topics
Further reading
Linear least squares is a way to solve systems of linear equations that can not otherwise be solved. The broad idea is to look for the thing that is closest to solving all of the equations, rather than solving all of the equations directly. The trick of the method that has lead to its wide adoption is that it minimizes the sum of the square errors between the observations and predictions, rather than the error itself. Let's begin by briefly reviewing some of the basics of linear systems.
A linear system is a set of equations that we can arrange to look like \[\begin{align*} a_{1,1} x_1 + a_{1,2} x_2 + \cdots + a_{1,n} x_n &= b_1, \\ a_{2,1} x_1 + a_{2,2} x_2 + \cdots + a_{2,n} x_n &= b_2, \\ \vdots & \quad \vdots \\ a_{m,1} x_1 + a_{m,2} x_2 + \cdots + a_{m,n} x_n &= b_n, \end{align*}\] where the \(a\)'s and \(b\)'s are all known numbers, and the \(x\)'s are the unknowns we would like to determine. This linear system can be written more compactly as a vectormatrix equation \[\mathbf{A x} = \mathbf{b}\quad\text{or}\quad \mathbf{A x}  \mathbf{b} = \mathbf{0}\] where \(\mathbf{b} = [ b _ i ]\) is an \(m\)dimensional vector, \(\mathbf{A} = [a _ {ij}]\) is an \(m \times n\) matrix with \(m\) rows and \(n\) columns, and \(\mathbf{x} = [ x _ i]\) is an \(n\)dimensional vector. When we are attempting to determine the set of unknowns \(\mathbf{x}\), each additional equation removes one dimension from the solution set. If the number of equations is equal to the number of unknowns (\(m = n\)), then the solution set is usually a zerodimensional point and we can determine the value of all the unknowns exactly (the exceptions being cases where one or more of the equations are repeating information already present in the preceding equations). If there are fewer equations than unknowns (\(m < n\)), the linear system is underdetermined, and we can only say that the solutions lie in a certain \(nm\) dimensional plane. If there are more equations than unknowns (\(m > n\)), then the linear system is usually overdetermined, and there are no solutions (unless enough of the equations are redundant).
Ideally, when we measure things, all our measurements are very accurate, and we can make just enough measurements so that we get a linear system having exactly as many independent equations as we have unknowns. In practice, however, our observations have errors, and we always want to check your work, in case an observation went really wrong. A good way to check our work for errors is to make more observations. But adding more observations means adding more equations to the system. So, we often have overdetermined systems. And since there's always a little error in these observations, it is unlikely that a single set of unknown values will satisfy all the equations at once.
Although there is no solution to an overdetermined system, we can still look for the solution that is "closest" to solving all of our equations at once, \(x _ {\rm closest}\). There are many possible ways to measure "closest". We could, for instance, calculate how much each equation is off by, and choose the value of \(x\) that minimized the largest of these errors. Or we could just sum the absolute values of all the errors and minimize this sum. The most popular approach is to pick values for our unknowns that minimize the sum of the squared errors for each equation  the unknowns that return the "least squares". As it turns out, the vector \(x _ {\rm closest}\) with the least sumofsquareserror solves a simplified linear system \[A^T A x _ {\rm closest} = A^T b.\] This equation is called the normal equation of the linear system.
Why the normal equation? The normal equation is derivable using standard minimization techniques from calculus  differentiate an error function and find where that derivative of the error vanishes. Because the error function involves only quadratics, it's derivative will be a linear function. It is good to have a reference handy for matrix calculus.
First, observe that if you want to sum the squares of the components of any vector \(v\), you can do this by taking the dotproduct of the vector with itself, or multiplying the vector by it's own matrix transpose, \[\sum_i v_i^2 = v \cdot v = v^T v.\] Now, we define the square error \[\begin{align*} E(x) &:= \sum _ i \left[ (Ax)_i  b_i \right]^2 \\ &= (Ax  b)^T (Axb) \\ &= x^T A^T A x  x^T A b  b^T A x + b^T b \end{align*}\] Since each of these terms is a scalar, \(x^T A b = (x^T A b)^T = b^T A x\), and \[\begin{align*} E(x) = x^T A^T A x  2 b^T A x + b^T b \end{align*}\] To find the solution \(x _ {\rm closest}\) the gives the least square error, differentiate \(E(x)\) with respect to \(x\) and set the derivative to zero: \[\frac{dE}{dx} = 2 x^T A^T A  2 b^T A = 0.\] If we now sort and transpose, we obtain the normal equation stated above. So, that's where the normal equations come from.
One of the reasons the math works out so nicely is that the sumofsquarederrors is always a convex function of the unknowns, and for convex functions on a bounded convex domain, there is always exactly one local minimum that is also a global minimum. When minimizing the raw error, there can be many different local minima. The normal equation also has a close connection to probability theory. If observation errors only occur in the constant vector \(b\) and these errors are normally distributed, then according to principles of probability, the solution of the normal equations gives the most likely values for the unknown vector \(x\).
Pro tip: For any of you familiar with the techniques of numerical linear algebra, it is often recommended to use a QR algorithm or SVD decomposition to find linear leastsquares solutions rather than solving the normal equations by elimination with partial pivoting  these alternative algorithms are more robust in cases where the normal equations are sensitive to small measurement errors. Computationally this means, do not invert \(A^TA\) to recover \(x=(A^T A)^{1} A^T b\). Use a builtin linear system solver like scipy.linalg.solve
instead, or even better, scipy.linalg.lstsq
. The version of the linear least squares algorithm presented here is the simplest, most naive version. Many variations and enhancements have been developed over the last century (see Björck, 1996).
In the 1864 book A manual of spherical and practical astronomy, William Chauvenet illustrates the method with a problem for coastal surveying (v 2, p551). A Coastguard station at Pine Mount surveyed the angles between 4 neighboring stations, Joscelyne, Deepwater, Deakyne, and Burden as follows, measured to a precision of thousandths of an arcsecond (or roughly a millionth of a degree). The conversion of (degrees, minutes, seconds) to decimal form can be quickly accomplished with the python function lambda d,m,s : ((s/60.+m)/60.+d)
Stations  Angle between  Decimal form 

Joscelyne to Deepwater 
65 11' 52.500" 
65.197917 
Deepwater to Deakyne 
66 24' 15.553" 
66.404320 
Deakyne to Burden 
87 2' 24.703" 
87.040195 
Burden to Joscelyne 
141 21' 21.757" 
141.356044 
But we also have one extra piece of information that hasn't been accounted for  the sum of all these angles must be 360 degrees. If we add up all four observed angles, we get 359.998476, close to 360 degrees, but off by more than a thousandth of a degree. That's definitely not correct to the millionth of a degree precision we expect, given the precision of the measurements reported. So, the task is to use this extra information about the sum of the angles to improve our accuracy, if we can.
If we let \(t\), \(u\), \(v\), and \(w\) be actual angles, we have 5 equations. \[\begin{align*} t &= 65.197917 \\ u &= 66.404320 \\ v &= 87.040195 \\ w &= 141.356044 \\ t + u + v + w &= 360 \end{align*}\] In matrix form, \[ A x = b\] where \[A = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 1 & 1 & 1 \end{bmatrix}, \quad x = \begin{bmatrix} t \\ u \\ v \\ w \end{bmatrix}, \quad b = \begin{bmatrix} 65.197917 \\ 66.404320 \\ 87.040195 \\ 41.356044 \\ 360 \end{bmatrix}. \]
The calculation from here on can be done by hand but are best performed by computer.
1 2 3 4 5 6 7 8 9 10 11 12 13 

We find \[A^T A = \begin{bmatrix}2 & 1 & 1 & 1\\1 & 2 & 1 & 1\\1 & 1 & 2 & 1\\1 & 1 & 1 & 2\end{bmatrix} \quad \text{and} \quad A^T b = \begin{bmatrix}425.197917\\426.40432\\447.040195\\501.356044\end{bmatrix}.\]
\[x _ {\text{closest}} = (A^T A)^{1} (A^T b) = \frac{ 1 }{ 5 } \begin{bmatrix}4&1&1&1\\1&4&1&1\\1&1&4&1\\1&1&1&4\end{bmatrix} \begin{bmatrix}425.197917\\426.40432\\447.040195\\501.356044\end{bmatrix} = \begin{bmatrix}65.1982218\\66.4046248\\87.0404998\\141.3563488\end{bmatrix}. \]
We find the best correction is to add 1.097 arcseconds to each angle estimate. This makes intuitive sense, since without more information, we expect all four of the measurements to have the same kinds of errors, and our auxiliary condition that the angles must sum to 360 degrees also treats all four symmetrically.
Stations  Observed angle  Corrected angle  Correction 

Joscelyne to Deepwater 
65° 11' 52.500" 
65° 11' 53.598" 
1.097" 
Deepwater to Deakyne 
66° 24' 15.553" 
66° 24' 16.649" 
1.097" 
Deakyne to Burden 
87° 2' 24.703" 
87° 2' 25.799" 
1.097" 
Burden to Joscelyne 
141° 21' 21.757" 
141° 21' 22.856" 
1.097" 
The method of least squares was developed almost simultaneously in France by AdrienMarie Legendre (1805), in (what is now) Germany by Carl Gauss(1809), and in the United States by Robert Adrian (1808). (Adrian's story is rarely told and somewhat controversial.) The very first public application of the method of least squares in 1805 was by Legendre to determine the figure of the earth and orbit of a comet. Laplace had previously attempted to solve the problem, and we will work with the data he collated. Since observations of natural variation in pendulum periods in 1672 started the debate, a number surveys and expeditions made careful measurements of the distance travelled to traverse 1 degree of latitude, including Maupertius's expedition to Lapland (1736), the French Geodesic Mission to Peru (1739), and the MasonDixon observation in Pennsylvania (1768).
[ Data : hide , shown as table , shown as CSV shown as QR ]
# latitude L, Ds/DL, arc length
#
# Volume 2, book 3, section 41, page 445 of Bowditch's translation
# https://archive.org/stream/mcaniquecles02laplrich#page/444/mode/2up
#
# Formula: ds/dL = m sin(L)**2 + c
# and we wish to determine m and c from observations of the
#
# latitude L measured in grads (400 grads = 360 degrees = 2 pi rads)
#
# the length of one degree (ds/dL) is measured in double toises per grad
# (1 toises is 1.949 meters, so y*3.898)
#
# The last column is the actual arclength measured to estimate ds/dL,
# and not used in our calculations
#
0.00000, 25538.85, 3.4633
37.0093, 25666.65, 1.3572
43.5556, 25599.60, 1.6435
47.7963, 25640.55, 2.4034
51.3327, 25658.28,10.7487
53.0926, 25683.30, 3.2734
73.7037, 25832.25, 1.0644
Working with the approximate equation \[m \sin^2(\lambda) + c \approx \frac{ds}{d\lambda},\] where \(m\) and \(c\) are unknowns, Laplace was able to use existing observations of lattitude \(\lambda\) and curvature \(ds/d\lambda\) to obtain 7 linear equations. But being overdetermined, these equation's didn't have a simultaneous solution, and Laplace didn't have a good method for solving these 7 equations for the 2 unknown variables \(c\) and \(m\), though he tried several methods.
Let's consider the general problem of finding a line through several data points. Suppose we have an equation of the form \(m u + c = v\), and we observe a bunch of data points \((u _ i, v _ i)\) that we think should occur on this line, except for errors in \(v _ i\). \[\begin{align*}
m u _ 1 + c &= v _ 1 \\
m u _ 2 + c &= v _ 2 \\
\vdots \\
m u _ i + c &= v _ i \\
\vdots
\end{align*}\] The parameters \(m\) and \(c\) are our unknowns that we'd like to solve for, given our data points, but we have way more than 2 equations for our 2 unknowns, so we'll use linear leastsquares. If we rearrange our equations into matrix form, \[\begin{gather*}
\begin{bmatrix} u _ 1 & 1 \\ u _ 2 & 1 \\ \vdots & \vdots \\ u _ i & 1 \\ \vdots & \vdots \end{bmatrix}
\begin{bmatrix} m \\ c \end{bmatrix}
=
\begin{bmatrix} v _ 1 \\ v _ 2 \\ \vdots \\ v _ i \\ \vdots \end{bmatrix}
\end{gather*}\] So, then \[\begin{gather*}
A = \begin{bmatrix} u _ 1 & 1 \\ u _ 2 & 1 \\ \vdots & \vdots \\ u _ i & 1 \\ \vdots & \vdots \end{bmatrix}, \quad
x = \begin{bmatrix} m \\ c \end{bmatrix}, \quad
b = \begin{bmatrix} v _ 1 \\ v _ 2 \\ \vdots \\ v _ i \\ \vdots \end{bmatrix}.
\end{gather*}\] We can now calculate \(A^T A\) and \(A^T b\) and then solve the normal equation \(A^T A x = A^T b\) for \(x\), i.e. for the parameters \(m\) and \(c\), using the python code
x = scipy.linalg.solve(A.T.dot(A), A.T.dot(b))
to determine the best fit.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 

When we evaluate the normal equations for Laplace's data, taking \(u_i = \sin^2 \lambda_i\) and \(v_i = \left( ds/d\lambda\right) _ i\), we find \[A^T A = \begin{bmatrix} 7. & 3.07473 \\ 3.07473 & 1.74303 \end{bmatrix}, \quad A^T b = \begin{bmatrix} 179619.48 \\ 79022.77 \end{bmatrix},\] which is equivalent to the 2 by 2 linear system \[\begin{align} 3.07473 m + 7 c &= 179619.48,\\ 1.74303 m + 3.07473 c &= 79022.77. \end{align}\] If we now solve this linear system, we find the best leastsquares fit is \(c \approx 25519\), \(m \approx 319.60\), and the regression line through the data is given as \[319.60 \sin^2 \lambda + 25519 = \frac{ds}{d\lambda} .\] Laplace used his own approach to minimizing the error, leading to the line \(308.202 \sin^2 \lambda + 25525 = ds/d\lambda\).
When we plot the data and the regression line together, as in the figure above, we see the leastsquares regression line does a good job of bisecting the data, but the data itself has enough variation to be scattered away from the line. Though calculated differently, there is essentially no difference between Laplace's result and ours when allowing for the uncertainty of the data. The data's trend is clear  inverse curvature increases with lattitude, so curvature decreases with latitude, implying that the earth bulges at the equator and is flatter at the poles.
Using our regressionline fit obtained above, and solving for our original variables, the elliptical distortion \(\epsilon = 2m /(2m+3 c)\approx 0.008280\). Hooke and Newton were correct  the earth bulges at the equator. We also derive the estimate, after converting back to standard units, of \(r \approx 6.38\) Mm, an estimate within 0.3 % of the modern value. These calculations do not settle the problem of determining the figure of the earth. There were significant errors in these data (as you might have guessed from their plot), and the method of local approximation to the curvature is not making full use of the data available. In 1826, with the help of a muchimproved data set and a little creativity, George Airy was able to obtain values that we can consider accurate by modern standards (see Exercises). We now know the earth's radius is about 6.36 megameters.
The linear least squares method for approximating solution to overdetermined linear systems has several advantages over other approaches like minimizing the total error or the maximum error. First, the square error forms a convex function, and thus, there is always a unique local minimum that is also the global minimum. This ensures reproducibility, and removes ambiguity. Second, the leastsquares solution can be calculated directly, with greater accuracy and efficiency than other optimization methods that require iterative evaluation for convergence. Third, the method can be theoretically justified in cases when the errors in the observations are normally distributed.
But don't mistake these advantages for "proof" that least squares methods and linear regression are universally the "right" approach to estimation problems. Least squares methods are a good generalpurpose approach and often give answers very similar to alternatives. But for problems where the value of greater accuracy justifies the time and effort or where there are strong nonlinearities, we can get better results by modelling the actual error mechanisms involved and using alternative optimization algorithms. And leastsquares will always gives you a solution, even if the question is nonsense! You should never accept it's results without some sanitychecking. We'll see more examples as things go along.
There is a second, parallel track in the story of the determination of the earth's figure that we should atleast mention. While much work on measurement and fitting was going on, there was also substantial theoretical work under way, in an effort to explain the figure of the earth using the laws of nature. Newton's work was the first step on this path, but was unsatisfactory to a number of attentive readers. Subsequent scholarly studies showed there was not a single equilibrium figure for a spinning blob of fluid with gravity, but a menagery of equilibria figures. The story unfolded over the next 200 hundred years of study, as new equilibria were discovered, and then shown to deform into some other new figure as the rate of spin of the blob increased. These analyses were some of the first examples of what we now call "bifurcation theory". Many possibly familiar names, including Clairaut, Maclaurin, Simpson, Jacobi, Poincare, and Liapounov appear in the story, and the curious may find it rewarding to seek out.
(Gauss) Find the leastsquares solution for \(x\), \(y\), and \(z\) of the following overdetermined linear system. \[\begin{align*} x  y + 2 z =& 3, \\ 3x + 2y  5 z =& 5, \\ 4x + y + 4 z =& 21, \\ 2x + 6y + 6 z =& 28. \end{align*}\]
The calculations Chauvenet actually performs in his book are a little more complicated than described above. Based on other information, some measurements were expected to be more accurate than others. Specifically, the measurement of the angle between Burden to Jescelyne is believed to be less accurate than the other three angles. It would be nice to be able to take this inaccuracy into account when calculating our leastsquares solution. And there is indeed a way  the method of weighted linear least squares. We can weight the errors produced by each equation such that equations measured to greater accuracy are given larger weights and equations with less accuracy are given smaller weights.
Find the matrix form for the equations of the method of weighted linear least squares by determining the vector \(x_{\rm closest}\) that minimizes the total weighted square error \[E(x) = \sum_{i} W_{ii} ( (Ax)_i  b_i)^2,\] where \(W\) is a nonnegative diagonal matrix.
Chauvenet weights the first 3 measurements as 3 times more accurate than the 4th, and the observation that angles must sum to 360 degrees with (say) 1,000 times more accurate. Use the method of weighted linear least equations obtained above to find the angles between stations to the nearest thousands of an arcsecond.
Replicate the calculations in the published version of Chauvenet's survey problem using the weighted normal equations. Explain any discrepancies with Chauvenet's result.
In a 2009 article, Head and other authors use a fossil vertebrae to deduce that about 60 million years ago, there existed a neotropical snake that grew to 13 meters in length and weighted more than 1,000 kilograms (42 feet and more than 2,000 pounds). They called this snake Titanoboa cerrejonensis. This discovery and others like it inspired the PBS Secrets of the Dead episode Graveyard of the giant beasts in 2016, also found on (youtube). Let's see if we agree with the hype by looking through this scientific paper and checking their numbers.
Fit a line through the data given below.
What were the two regression lines the team obtained for the relationship between between vertebra width and total body length using the extreme values of 60% and 65% for vertebral position?
Create a plot of width vs total body length using known 60% and 65% position data. Add to this plot the regression lines from parts (a) and (b). Make sure your axes scales are large enough to show the predicted body length of Titanoboa, given that the discovered vertebra was 12 cm wide.
Is your line consistent with the lines found by the team?
Do you think Head et al's conclusion about body length are supported by the evidence, are contradicted by the evidence, or that the evidence is inconclusive? Defend your opinion.
[ Data : hide , shown as table , shown as CSV shown as QR ]
# VW 65% (mm), VW 60% (mm),TBL(mm),SVL(mm)
# These are the data column headings ..
#
# Extant snake data used to estimate Titanoboa's TBL and SVL
# VW = (V)ertabral (W)idth at a fixed precentage done spine
# TBL = (t)otal (b)ody (l)ength
# SVL = (s)nout (v)ent (l)ength
#
# @article{bib:Head2009,
# title={Giant boid snake from the Palaeocene neotropics reveals hotter past equatorial temperatures},
# volume={457},
# ISSN={14764687},
# url={http://dx.doi.org/10.1038/nature07671},
# DOI={10.1038/nature07671},
# number={7230},
# journal={Nature},
# publisher={Springer Nature},
# author={Head, Jason J. and Bloch, Jonathan I. and Hastings, Alexander K. and Bourque, Jason R. and Cadena, Edwin A. and Herrera, Fabiany A. and Polly, P. David and Jaramillo, Carlos A.},
# year={2009},
# month={Feb},
# pages={715717},
# }
#
# Data extracted from Supplement, Table 2
#
6.530,6.380,1535,1423
19.04,19.66,2040,1900
21.42,23.50,2480,2390
12.64,12.92,1606,1355
27.08,27.69,3220,2970
28.01,28.20,3434,3129
10.96,11.22,867,803
11.80,12.20,1450,1216
10.54,10.79,1734,1360
17.97,17.61,2330,1960
11.27,11.54,1380,1195
10.22,10.65,1210,1070
11.58,11.66,1700,1450
10.85,11.20,1770,1385
15.08,15.09,2250,1950
10.66,10.80,1720,1490
17.26,17.10,2470,2110
23.99,24.32,3320,2910
17.24,17.67,2510,2190
23.12,24.33,2690,2310
15.69,16.15,1760,1610
(Hard) In 1807, Nathaniel Bowditch observed a comet in the New England sky and made a series of astronomical observations. He then attempted to calculate the orbit of the comet using the methods described by Laplace. In the process, he developed a system of 56 equations for 5 unknowns.
[ Data : hide , shown as table , shown as CSV shown as QR ]
# eq #, c_d, c_t, c_p, c_n, c_i, constant
# data from pages 7, 8, and 9 of Bowditch, 1807
# http://www.jstor.org/stable/25057874
# Special thanks to ZY and HP for transcription
#
1, 315, 2, 129, 239, 234, 38
2, 317, 1, 138, 242, 248, 88
3, 319, 4, 145, 244, 262, 98
4, 323, 8, 151, 246, 279, 84
5, 330, 2, 198, 249, 412, 21
6, 329, 14, 207, 246, 448, 75
7, 327, 17, 208, 242, 467, 16
8, 324, 26, 212, 240, 485, 164
9, 321, 31, 213, 235, 505, 116
10, 318, 40, 215, 231, 526, 71
11, 314, 48, 215, 225, 547, 86
12, 308, 57, 214, 218, 567, 278
13, 304, 67, 213, 211, 589, 158
14, 199, 202, 157, 108, 789, 413
15, 140, 262, 119, 59, 851, 385
16, 122, 280, 99, 37, 876, 120
17, 95, 305, 81, 16, 895, 304
18, 14, 378, 12, 55, 955, 1081
19, 47, 429, 40, 105, 987, 775
20, 187, 533, 165, 217,1033, 263
21, 267, 588, 240, 280,1045, 164
22, 306, 613, 282, 314,1048, 757
23, 509, 726, 483, 464,1020, 204
24, 542, 742, 520, 489,1007, 238
25, 582, 761, 562, 517, 992, 614
26, 690, 804, 676, 587, 929, 47
27, 724, 813, 713, 609, 905, 1628
28, 944, 789,1006, 716, 496, 69
29, 226, 957, 336, 214, 43, 180
30, 237, 950, 337, 212, 41, 235
31, 247, 945, 339, 211, 39, 301
32, 258, 933, 341, 209, 38, 143
33, 333, 865, 361, 208, 28, 262
34, 349, 849, 367, 210, 27, 197
35, 359, 839, 372, 211, 27, 386
36, 366, 831, 374, 212, 27, 180
37, 375, 824, 378, 213, 28, 98
38, 383, 814, 382, 215, 29, 127
39, 390, 804, 386, 217, 28, 92
40, 397, 794, 391, 220, 31, 74
41, 403, 784, 395, 223, 32, 108
42, 463, 702, 435, 241, 74, 244
43, 476, 671, 447, 245, 97, 294
44, 478, 659, 451, 246, 106, 197
45, 483, 651, 454, 246, 115, 413
46, 491, 617, 464, 246, 146, 803
47, 493, 593, 468, 242, 170, 200
48, 488, 541, 468, 229, 223, 177
49, 484, 516, 466, 220, 252, 287
50, 477, 500, 464, 215, 267, 424
51, 447, 428, 440, 176, 342, 565
52, 436, 412, 434, 167, 355, 101
53, 427, 397, 426, 157, 370, 143
54, 395, 351, 398, 123, 412, 589
55, 384, 338, 387, 111, 425,1082
56, 206, 167, 219, 56, 539, 229
Laplace proposed that the equations for the datafitting should be weighted in proportion to the total length of arc measured, reasoning that the larger the arc, the more accurate the estimate of the length of a single degree. If Laplace's data is fit using weighted linear least squares (see exercise in previous lecture), what estimates do we get for \(m\) and \(c\)?
(From Sand, 1995) The following data set illustrates Bergmann's rule that body size within a species should increase with lattitude. Using the linear leastsquare method, find a regression line through the data that is the best fit.
[ Data : hide , shown as table , shown as CSV shown as QR ]
# This data set illustrates Bergmann's rule of
# animal size increasing with lattitude with Swedish Moose
# https://en.wikipedia.org/wiki/Bergmann's_rule
#
# @article{bib:Sand1995,
# title={Geographical and latitudinal variation in growth
# patterns and adult body size of Swedish moose (Alces alces)},
# volume={102},
# ISSN={14321939},
# url={http://dx.doi.org/10.1007/BF00341355},
# DOI={10.1007/bf00341355},
# number={4},
# journal={Oecologia},
# publisher={Springer Nature},
# author={Sand, H�kan and Cederlund, G�ran and Danell, Kjell},
# year={1995},
# pages={433442},
# }
#
# Data from tables 1 and 2
#
# Columns ...
# Lattitude (degrees N), mean male adult mass in herd (kg)
58.0,214.5
57.7,216.3
58.0,217.8
57.9,214.0
59.8,215.1
61.6,241.9
62.0,247.8
62.7,235.2
64.0,259.2
65.5,250.0
66.0,246.4
66.0,262.2
[ Data : hide , shown as table , shown as CSV shown as QR ]
#patient id, age (days), birthweight (oz), systolic blood pressure (mm Hg)
#Data from Rosner, Fundamentals of Biostatistics, page 468
01, 3, 135, 89
02, 4, 120, 90
03, 3, 100, 83
04, 2, 105, 77
05, 4, 130, 92
06, 5, 125, 98
07, 2, 125, 82
08, 3, 105, 85
09, 5, 120, 96
10, 4, 090, 95
11, 2, 120, 80
12, 3, 095, 79
13, 3, 120, 86
14, 4, 150, 97
15, 3, 160, 92
16, 3, 125, 88
The method of linear least squares can be applied to fitting certain polynomial curves to data. The parabola \(y = p_0 + p_1 x + p_2 x^2\) can be fit to data by letting the columns of the matrix \(A\) be the monomial powers \(x^0\), \(x^1\), and \(x^2\), with the coefficients \(p_i\) being our unknowns. Use the method of linear least squares to find the parabola that best fits the points \((2,8)\), \((0,5)\), \((2, 3)\), \((4,4)\), and \((6,9)\).
Fit a circle to the following data set by applying the method of linear least squares to estimate the parameters \(a\), \(b\), and \(c\) in the circle equation \[x^2 + y^2 = a + b x + c y.\]
[ Data : hide , shown as table , shown as CSV shown as QR ]
4.699 15.127
5.003 13.942
5.191 15.625
5.309 15.477
5.367 15.923
5.399 14.886
5.461 13.778
5.563 13.871
5.858 16.825
6.305 12.915
6.435 17.143
6.750 17.045
6.789 16.919
6.882 13.101
6.977 16.659
7.154 13.088
7.228 12.828
7.855 12.871
8.072 16.715
8.468 14.796
8.527 13.972
8.690 14.195
8.835 15.526
8.853 13.606
8.898 15.991
8.946 15.823
9.021 15.789
9.038 16.106
9.051 16.004
9.094 15.117
George Airy (1826), like Legendre before him, recognized that the local approximation wasn't a great match to the long transects contemporary surveyers were making in their efforts to improve accuracy. Instead of working directly with point estimates of curvature, Airy compared estimates of arc length between different latitudes. Airy's data (given below) is of the form \((\phi, \psi, d)\), where \(\phi\) is the starting latitude, \(\psi\) is the ending latitude, and \(d\) is the distance measured between the two latitudes. Let's try a version of his method and see what we get.
[ Data : hide , shown as table , shown as CSV shown as QR ]
# lower latitude, upper latitude, measured distance
#
# These are the improved data used by Bowditch and Airy
# in their estimates of the figure of the earth.
# https://archive.org/stream/mcaniquecles02laplrich#page/452/mode/2up
#
# latitudes are in radians
# the measured distance between latitudes is in meters
#
7.3313525e04, 5.36780860e02, 3.44747088e+05
1.42430454e01, 3.15146346e01, 1.09477454e+06
6.74841785e01, 8.90744298e01, 1.37446573e+06
8.83453039e01, 9.33023396e01, 3.15927029e+05
1.14362830e+00, 1.17193793e+00, 1.80813456e+05
(Medium) One of the sources of error in Airy's approximation is the use of only the first two terms in the series approximation of \(ds/d\lambda\).