Skip to content
— CH. 1 · ORIGINS AND ETYMOLOGY —

Statistics

~4 min read · Ch. 1 of 6
6 sections
  • The word statistics comes from the Latin term status, meaning situation or condition in society. In late Latin, this evolved to mean state. Political scientist Gottfried Achenwall coined the German word statistik in 1749 as a summary of how things stand. The earliest writing containing statistics in Europe dates back to 1663 with John Graunt's Natural and Political Observations upon the Bills of Mortality. This publication analyzed death records to understand population trends. Italian scholar Girolamo Ghilini introduced the term statistic in 1589 for a collection of facts about a state. Early applications revolved around state needs to base policy on demographic and economic data. British statisticians were often called statists during the mid-19th century when statistical offices formed across Europe and America.

  • When full census data cannot be collected, statisticians develop specific experiment designs and survey samples. Representative sampling ensures that inferences extend safely from the sample to the entire population. An experimental study involves taking measurements, manipulating the system, then measuring again to see if changes occurred. The Hawthorne study examined worker productivity at the Western Electric Company plant by modifying illumination levels. Researchers found productivity improved under experimental conditions but lacked a control group and blindness. The Hawthorne effect describes outcomes changing due to observation itself rather than the manipulated variable. In contrast, an observational study gathers data without experimental manipulation to investigate correlations between predictors and response variables. A case-control study invites people with and without lung cancer to participate and collects their exposure histories.

  • A descriptive statistic quantitatively summarizes features of a collection of information while descriptive statistics analyze those summaries. Descriptive statistics focus on central tendency and dispersion within a distribution. Central tendency characterizes the typical value while dispersion measures how members depart from that center. Standard deviation refers to the extent individual observations differ from a central value like the mean. Inferential statistics deduce properties of an underlying probability distribution using data analysis. Statistical inference infers population properties by testing hypotheses and deriving estimates. It assumes observed data comes from a larger population unlike descriptive methods which only concern observed data properties. Confidence intervals allow statisticians to express how closely a sample estimate matches the true population value. Often expressed as 95% confidence intervals, these ranges include the true value in 95% of all possible cases if repeated.

  • Working from a null hypothesis, two broad categories of error are recognized: Type I errors and Type II errors. A Type I error occurs when the null hypothesis is falsely rejected giving a false positive result. A Type II error happens when the null hypothesis fails to be rejected despite an actual difference existing between populations. This gives a false negative outcome. The best illustration for a novice involves a criminal trial where the null hypothesis asserts innocence. The alternative hypothesis claims guilt based on suspicion. The jury does not necessarily accept the null hypothesis but fails to reject it due to insufficient evidence. Ronald Fisher coined the term null hypothesis during his Lady tasting tea experiment in 1935. He stated this hypothesis is never proved or established but possibly disproved through experimentation. The p-value represents the probability observing a result at least as extreme as the test statistic assuming the null hypothesis holds true.

  • Formal discussions on inference date back to mathematicians and cryptographers of the Islamic Golden Age between the 8th and 13th centuries. Al-Khalil wrote the Book of Cryptographic Messages containing one of the first uses of permutations and combinations. Jacob Bernoulli's posthumous work combined games of chance with the realm of probable opinion submitted to mathematical analysis. Adrien-Marie Legendre described the method of least squares in 1805 though Carl Friedrich Gauss used it earlier in 1795. Belgian scientist Adolphe Quetelet introduced the notion of the average man l'homme moyen in the 1830s. Francis Galton and Karl Pearson transformed statistics into a rigorous mathematical discipline at the turn of the century. Galton introduced standard deviation, correlation, regression analysis, and applied these methods to human characteristics like height and weight. Pearson developed the product-moment correlation coefficient and founded Biometrika as the first journal of mathematical statistics.

  • Rapid increases in computing power starting from the second half of the 20th century substantially impacted statistical science practice. Early models were almost always linear but powerful computers caused increased interest in nonlinear models like neural networks. New types such as generalized linear models and multilevel models emerged alongside computational algorithms. Increased computing power led to growing popularity of resampling methods including permutation tests and the bootstrap. Techniques like Gibbs sampling made Bayesian models more feasible through numerical approximation. A large number of general and special purpose software packages are now available including Mathematica SAS SPSS and R. Machine learning models capture patterns in data through use of computational algorithms that function as statistical and probabilistic models. Statistics continues to be an area of active research particularly regarding how to analyze big data effectively.

Common questions

When did Gottfried Achenwall coin the word statistik?

Gottfried Achenwall coined the German word statistik in 1749 as a summary of how things stand. The term evolved from the Latin status meaning situation or condition in society to mean state in late Latin.

What is the Hawthorne effect and when was it discovered?

The Hawthorne effect describes outcomes changing due to observation itself rather than the manipulated variable. This phenomenon emerged during the Hawthorne study which examined worker productivity at the Western Electric Company plant by modifying illumination levels.

Who introduced standard deviation and correlation analysis?

Francis Galton introduced standard deviation, correlation, regression analysis, and applied these methods to human characteristics like height and weight. Karl Pearson developed the product-moment correlation coefficient and founded Biometrika as the first journal of mathematical statistics.

When did Ronald Fisher coin the term null hypothesis?

Ronald Fisher coined the term null hypothesis during his Lady tasting tea experiment in 1935. He stated this hypothesis is never proved or established but possibly disproved through experimentation.

Which mathematician described the method of least squares in 1805?

Adrien-Marie Legendre described the method of least squares in 1805 though Carl Friedrich Gauss used it earlier in 1795. Formal discussions on inference date back to mathematicians and cryptographers of the Islamic Golden Age between the 8th and 13th centuries.