Thursday, July 23, 2015

The Meaning of the Mean

The Average as a Surrogate for the Total

In statistics and probability, the arithmetic mean or the average value is often considered to be especially important.  There are some good reasons for this – sometimes.  Other times, not so much. For starters, there really is no such thing as an average person.  So, knowing the average value for a population may not give you very much information about yourself or any other specific individual.  But, for the purpose of providing a quantitative description of a population, the average often works rather well.  The reason is simple; the average is proportional to the total:

Average = Population Total / number of persons

Therefore, as long as the utility function is also proportional to the quantitative value, the average serves as a utilitarian measure of value.   Even though that proposition is dubitable to the point of being obviously wrong under some circumstances (e.g. for risk assessments where the risk is driven by extreme values), the “average person” often serves as a useful stand in for "everyone".

Then, there is the "Expected Value".  Mathematical probability was originally devised to calculate the frequency of occurrence of specific results from games of chance played many times.  This in turn, allowed the rate of return over a long period (theoretically infinite) of time to be estimated.   Once again, the value of interest corresponds to the arithmetic mean.   For example, a gambler seeking to profit from a series of bets can evaluate the bet as follows:
Expected Value = Total Net Return / number of bets

The use of mathematical probability in finance and insurance often uses the same underlying logic:  Given the fact are sure to be some bad loans and bad insurance risks, the key having a profitable business is to have the average return be positive.  At least, that is what investors expect.

Measurement Error

Using the Standard Error of the Mean to characterize the uncertainty associated with scientific measurements has a long history.  Writing in 1755, Thomas Simpson adapted the Bernoulli theorem (aka the law of large numbers) to make the following observation (quote from Stigler, 1986):
Upon the whole …. It appears, that the taking of the Mean of a number of observations, greatly diminishes the chances for all the smaller errors, and cuts off almost all possibility of any great ones: which last consideration, alone, seems sufficient to recommend the use of the method, not only to astronomers, but to all others concerned in making experiments of any kind (to which the above reasoning is equally applicable).  And the more observations or experiments that are made, the less will the conclusion be liable to err, provided they admit of being repeated under similar circumstances.
However Simpson’s claim was met by immediate criticism from Thomas Bayes, who noted (also in Stigler, 1986):
As I see no mistakes in Mr. Simpson’s calculations, I will venture to say that there is one in the Hypothesis on which he proceeds.  And I think it is manifestly this, when we observe with imperfect instruments or organs; he supposes that the chances for the same error in excess or defect are exactly the same, and upon this hypothesis only has he shown the incredible advantage, which he would prove arises from taking the mean of a great many observations.
In other words, the standard error of the mean accurately characterizes the uncertainty of a measurement only when, as Simpson assumed, the true value corresponds to the arithmetic mean.  If it doesn’t, then even though the theorem is true, the result is irrelevant.  For example, if the errors are lognormally distributed, then the true value will correspond to the geometric mean rather than the arithmetic mean.  If the underlying distribution of the measurements to the true value is unknown, then so is the relationship of the true value to the distribution.  Calling the mean value the expected value doesn’t help at all.

Averaging the Truth

In the realm of the probability of chance, the mean value is almost certainly given far more credence than it deserves.  But still, under most circumstances the arithmetic mean isn’t too far from the actual value of interest to not be considered approximately true.   On the other hand, with the probability of causes, or any other notion of probability arising from a notion of competing theoretical propositions, there is no basis for using a mean value at all.  For example, consider the probability that the earth is round as opposed to flat.   As a decision problem, under no circumstances would it make any sense to average the flat earth theory with the round one.  Yet, that is essentially what Bayesian Model Averaging does.

The admirable trait of Bayesian Model Averaging (BMA) is that it acknowledges that different plausible models may yield estimates that may be quite different (Hoeting et al, 1999).   Like a probability tree treatment of model uncertainty (e.g. Evans et al, 1994; Carrington et al, 2013), BMA requires identification of a set of alternative plausible models and establishing a model probability that will surely require some degree of subjective judgement.  But, with BMA the subjective probability is just the prior probability rather than the finished product.  Bayesian updating and averaging is the next step.

The differences between BMA and an unvarnished probability tree are all attributable to different notions of probability.  Like Bayesian schemes in general, BMA is intended to give the probability of causes a mathematical treatment that resembles that used for the probability of chance; and the fixation on the arithmetic mean comes with that package.  A probability tree approach that embellishes a weight-of-the-evidence evaluation is apt to use something like the Bradford-Hill criteria (Hill, 1966) to establish model probabilities, none of which assign any importance to the arithmetic mean.  Given the fact that assuming the mean is what led Thomas Bayes to criticize Simpson, it seems that the real Bayes would never have approved of BMA.

Along with a range or outer bounds, the mean is perhaps a useful central estimate even when uncertainty arises from competing plausible propositions.   But, since it corresponds to a common legal standard of proof (“preponderance of the evidence”), the median is better for many purposes.  But there may be room for both.  The real problem with BMA is that it proffers the arithmetic mean as the value of interest.  It isn’t really; the value at stake is the truth.  If current science is unable to divulge it, then we really don’t know what to expect.

References

Carrington CD, Murray C, and Tao, S. (2013). A Quantitative Assessment of Inorganic Arsenic in Apple Juice
Evans, J.S., Graham, J.D., Gray, G.M., and Sielken, R.L., Jr. (1994). A distributional approach to characterizing low-dose cancer risk.  Risk Anal 14:25-34.
Hill, Sir Arthur Bradford (1965).  The Environment and Disease: Association or Causation?  Proc Royal Soc Med 58:295-300.
Hoeting JA, Madigan D, Raftery AE, and Volinsky VT (1999).  Bayesian Model Averaging: A Tutorial. Statistical Science 14:382–417.
Stigler SM (1986).  Probabilities and the Measurement of Uncertainty.  In: The History of Statistics: The Measurement of Uncertainty before 1900.  Belknap Press, Cambridge MA, pp. 62-98.

Official Post Soundtrack


Supertramp (1975).  The Meaning.  In: Crisis, What Crisis?, Track 9.

Post Notes

Thesis Post #47.  Best read in conjunction with "A Dictionary of Probability" an "Quantifiers". 

No comments:

Post a Comment