— Ch. 1 · Defining Systematic Inaccuracy —
Bias (statistics).
~4 min read · Ch. 1 of 6
In the field of statistics, bias is a systematic tendency in which the methods used to gather data and estimate a sample statistic present an inaccurate, skewed or distorted depiction of reality. This definition applies across numerous stages of the data collection and analysis process. The source of the data often introduces distortion before any numbers are even calculated. Methods used to collect that data can further skew results away from true reality. The estimator chosen by researchers determines how raw observations become final estimates. Analysis methods applied at the end stage may amplify existing distortions. Data analysts take various measures at each stage to reduce this impact on their work. Understanding the source helps assess whether observed results are close to actuality. Issues of statistical bias have been argued to be closely linked to issues of statistical validity.
Mathematical Estimator Properties
Statistical bias is a feature of a statistical technique or of its results whereby the expected value of the results differs from the true underlying quantitative parameter being estimated. Let theta hat be a statistic used to estimate a parameter theta. Let E denote the expected value of theta hat. Then the difference between E and theta is called the bias of the statistic with respect to theta. If that difference equals zero, then theta hat is said to be an unbiased estimator of theta. Otherwise it is said to be a biased estimator of theta. The bias of a statistic is always relative to the parameter it is used to estimate. The parameter theta is often omitted when it is clear from context what is being estimated. Although an unbiased estimator is theoretically preferable to a biased estimator in practice, biased estimators with small biases are frequently used. A biased estimator may be more useful for several reasons including lower mean squared error values.