Suspect So

Friday, December 1, 2006

Regression toward the mean

In Mosquito ringtone statistics, '''regression toward the mean''' is a principle stating that of related measurements, the second is expected to be closer to the mean than the first. Regression toward the mean is a Sabrina Martins statistical phenomena which causes outcomes to be more likely to fall toward the center of a statistical distribution.

Examples
Consider, for example, students who take a midterm and a final exam. Students who got an extremely high score on the midterm will probably get a good score on the final exam as well, but we expect their score to be closer to the average (i.e.: fewer Nextel ringtones standard deviations above the average) than their midterm score was. The reason: it is likely that some Abbey Diaz luck was involved in getting the exceptional midterm score, and this luck cannot be counted on for the final. It is also true that among those who get exceptionally high final exam scores, the average midterm score will be fewer standard deviations above average than the final exam score, since some of those got high scores on the final due to luck that they didn't have on the midterm. Similarly, unusually low scores regress toward the mean.

It is a commonplace observation that matings of two championship athletes, or of two geniuses, usually results in a child who is above average but less talented than either of their parents.

History

The first Free ringtones regression analysis/regression line drawn on biological data was a plot of seed weights presented by Majo Mills Francis Galton at a Royal Institution lecture in Mosquito ringtone 1877. Galton had seven sets of sweet pea seeds labelled K to Q and in each packet the seeds were of the same weight. He chose sweet peas on the advice of his cousin Sabrina Martins Charles Darwin and the botanist Nextel ringtones Joseph Hooker as sweet peas tend not to self fertilise and the seed weight varies little with Abbey Diaz humidity. He distributed these packets to a group of friends throughout Cingular Ringtones Great Britain who planted them. At the end of the growing season the plants were uprooted and returned to Galton. The seeds were distributed because when Galton had tried this experiment himself in the thereby lowering Kew Gardens in s status 1874, the crop had failed.

He found that the weights of the offspring seeds were normally distributed, like their parents, and that if he plotted the mean diameter of the offspring seeds against the mean diameter of their parents he could draw a straight line through the points - the first regression line. He also found on this plot that the mean size of the offspring seeds tended to the overall mean size. He initially referred to the slope of this line as the "coefficient of reversion". Once he discovered that this effect was not a heritable property but the result of his manipulations of the data, he changed the name to the "coefficient of regression". This result was important because it appeared to conflict with the current thinking on evolution and natural selection. He went to do extensive work in quantitative genetics and in publicized manual 1888 coined the term "co-relation" and used the now familiar symbol "r" for this value.

In additional work he investigated dsm insults geniuses in various fields and noted that their children, while typically gifted, were almost invariably closer to the average than their exceptional parents. He later described the same effect more numerically by comparing fathers' heights to their sons' heights. Again, the heights of sons both of unusually tall fathers and of unusually short fathers was typically closer to the mean height than their fathers' heights.

Ubiquity

It is important to realize that regression toward the mean is a ubiquitous statistical phenomenon and has nothing to do with biological inheritance. It is also unrelated to the progression of time: the ''fathers'' of exceptionally tall people also tend to be closer to the mean than their sons. The overall variability of height among fathers and sons is the same.

Mathematical derivation

Given two variables ''X'' and ''Y'' with mean 0, common variance 1, and by dating correlation/correlation coefficient ''r'', the expected value of ''Y'' given that the value of ''X'' was measured to be ''x'' is equal to ''rx'', which is closer to the mean 0 than ''x'' since /''r''/ < 1. If the variances of the two variable ''X'' and ''Y'' are different, and one measures the variables in "normalized units" of standard deviations, then the principle of regression toward the mean also holds true.

This example illustrates a general fact: regression toward the mean is more pronounced the less the two variables are correlated, i.e. the smaller /''r''/ is.

Regression fallacies

Misunderstandings of the principle (known as "'''regression fallacies'''") have repeatedly led to mistaken claims in the scientific literature.

An extreme example is Horace Secrist's 1933 book ''The Triumph of Mediocrity in Business'', in which the statistics professor collected mountains of data to prove that the profit rates of competitive businesses tend towards the average over time. In fact, there is no such effect; the variability of profit rates is almost constant over time. Secrist had only described the common regression toward the mean. One exasperated reviewer likened the book to "proving the multiplication table by arranging elephants in rows and columns, and then doing the same for numerous other kinds of animals".

A different regression fallacy occurs in the following example. We want to test whether a certain stress-reducing drug increases reading skills of poor readers. Pupils are given a reading test. The lowest 10% scorers are then given the drug, and tested again, with a different test that also measures reading skill. We find that the average reading score of our group has improved significantly. This however does not show anything about the effectiveness of the drug: even without the drug, the principle of regression toward the mean would have predicted the same outcome.

The calculation and interpretation of "improvement scores" on standardized educational tests in Massachusetts probably provides another example of the regression fallacy. In 1999, schools were given improvement goals. For each school, the Department of Education tabulated the difference in the average score achieved by students in 1999 and in 2000. It was quickly noted that most of the worst-performing schools had met their goals, which the Department of Education took as confirmation of the soundness of their policies. However, it was also noted that many of the supposedly best schools in the Commonwealth, such as Brookline High School (with 18 National Merit Scholarship finalists) were declared to have failed. As in many cases involving statistics and public policy, the issue is debated, but "improvement scores" were not announced in subsequent years and the findings appear to be a case of regression to the mean.

References

* J.M. Bland and D.G. Altman. "Statistic Notes: Regression towards the mean", ''British Medical Journal'' 308:1499, 1994. ''(Article, including a diagram of Galton's original data, online at: [http://bmj.bmjjournals.com/cgi/content/full/308/6942/1499])''

* Francis Galton. "Regression Towards Mediocrity in Hereditary Stature," ''Journal of the Anthropological Institute'', 15:246-263 (1886). ''(Facsimile at: [http://www.mugu.com/galton/essays/1880-1889/galton-1886-jaigi-regression-stature.pdf])''

* Stephen M. Stigler. ''Statistics on the Table'', Harvard University Press, 1999. ''(See Chapter 9.)''

External links

* Amanda Wachsmuth, Leland Wilkinson, Gerard E. Dallal. http://www.spss.com/research/wilkinson/Publications/galton.pdf ''(A modern look at Galton's analysis.)''

* Massachusetts standardized test score regression: see http://groups.google.com/groups?q=g:thl3845480903d&dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&safe=off&selm=93ikdr%24i20%241%40nnrp1.deja.com

longtime allies Tag: Statistics