Tuesday, March 10, 2026

An Ode to Regression Analysis

Hello Again

It's been a while.  But the bug to write yet another essay has bitten me and I don't know what to do with it besides putting here.  It more or less started with having a wiki article on the History of Probability called to my attention.  I was gratified to see that it opened by acknowledging the duality of probability that I figure is a matter of psychological fact.  But then, as usual in my experience, the rest of the article proceeded to focus on stochastical probability, aka frequency of occurrence.  
 
Most annoying to me is that even though Ian Hacking’s The Emergence of Probability was referenced three times, the chapter (8) that discussed Pascal’s Wager wasn’t mentioned at all.  That’s where Hacking discussed the ins and outs of using probability trees (aka probability logic) to represent the evidential status of competing theories.

I get it; frequentist probability is much more amenable to a mathematical treatment than the evidential ancillary could ever hope to be; and that’s what the article is really about.  Hume (1739) and Hill (1965) resorted to subjective rules of evidence with no mathematics whatsoever rather than a deductive process where the premises automatically dictate the conclusions.  That doesn’t fit into a history of mathematical probability.

But I've been over all that before on this blog.  What's got me going again is that I've come up with another angle.  Quantifying stochastic probability has been done, but quantifying evidence is another thing altogether.

An Important Caution

Devising a mathematical methodology for assigning a numerical probability to competing theories is rather obviously not always possible.  For example, Pascal’s wager was on “God Is” vs “God Is Not”.  Deciding who the killer is in a murder mystery doesn't involve mathematics either even when it is quantified beyond a reasonable doubt.  Furthermore, mathematics is a deductive process; the conclusions must follow from the premises.  On the other hand, weighing evidence or generating hypotheses in the first place is inductive.  Or so they say, because it is also usually conceded that weighing evidence involves subjective judgment.  But when the theories themselves are mathematical then perhaps something useful can be done to make the evaluation not quite so subjective.  

Regression Analysis

As it turns out, if you ascribe discrepancies between a quantitative model and a set of observations to measurement error with a stochastic distribution then you can turn the estimation of model parameter values into a statistical problem.  It’s neat trick.  Sure, you can use a ruler to draw a straight line through scattered data, but different line drawers may end up with somewhat different slopes.  But with least squares linear regression you get the same result every time. 

Linear regression can performed with relatively straightforward mathematics.  But the model has to be linear.  However, with the aid of a computer using trial and error methodology where parameters are adjusted up and down to find if the fit improved or not, least squares regression can applied to any model.  Furthermore, it doesn’t even have to be least squares; any other methodology that weights the relative importance of discrepancies between model and observations may be used instead.  In particular, weighting unsquared residuals places less weight on large deviations than squared residuals do.

However, the underlying rationale for regression analysis is not entirely justified.  First, even if the discrepancies between model and observation are a result of measurement error the actual distribution is usually unknown.  Second, the set of observations may not be entirely representative of the actual distribution.  Third, the model may be not entirely correct.  That is especially likely with multivariate analyses where mismodeling one quantitative relationship can end up with misestimation of the other model parameters as well.  At that point it may be time to consider a new hypothesis.

So calling regression analysis “statistical” or even mathematical is a big stretch.  But I still think it’s very useful because it is using data as evidence for models and theories.  In fact, it can be thought of as quantitative induction.  That is good, very good in fact.  Furthermore, it seems clear that regression analysis has a role to play in filling out probability trees for competing theories with numerical probability assignments that sum to one.

Quantifying the Probability of Competing Hypotheses

The Bayesian Strategy

Employers of Bayes Theorem definitely understand that probability is not the same as frequency of occurrence and they are also comfortable with assigning probabilities to competing models.  Known as Bayesian inference, this is accomplished by assimilating the alternative models into a supermodel and then performing a regression analysis that assigns greater probabilities to the model(s) that fit the best.  

A Beyesian analysis can  also let subjective expert opinion be part of the process, but there’s a catch; the contributions from the experts comes before the regression analysis.  That suffers from the same general problem of trying to make grading evidence a deductive exercise; it’s just not consistent with the way science works.  After all, the issue underlying hypotheses lies in evaluating if they are true rather than how often they are true.   

The Pearson Strategy

There are two sorts of correlation coefficients, aka r-values.  The first measures the association between two different measurements of sets of observations  (e.g. genetics and the occurrence of a disease).  But a Pearson correlation coefficient can also be used to measure the relationship between the values predicted by a model and those observed, and it’s generated by linear regression.  You can easily produce something analogous to the Pearson r value for any regression methodology. The Bayes factor is also functionally equivalent to an r value.

Pearson himself thought the r value was useful for grading the strength of an inference (Porter, 1986).  It plugs in nicely to the first three Hill criteria, namely strength, consistency, and specificity.  It’s not to hard to argue that a model or theory with a higher r value deserves a higher probability assignment.  You could even devise an algorithm or equation that at least somewhat fairly directs the relationship.  Yes, it would be somewhat arbitrary, but I’ll take it all day over safety factors or default assumptions.

In Summary

There are two approaches for combining data and expert opinion.  The Bayesian approach starts with expert opinion and then uses data to produce final evidential judgments.  The Pearson-Hill approach produces a measure of how well the data fits each hypothesis, but leaves the final evidential judgment to experts.  I'll discuss pros and cons next.

References

Hacking, I (1975).  The Great Decision.  In: The Emergence of Probability.  Cambridge University Press, pp. 63-72.

Hill, AB (1965).  "The Environment and Disease: Association or Causation?". Proceedings of the Royal Society of Medicine. 58 (5): 295–300.

Hume, D (1739).  A Treatise on Human Nature.  Book I, Section XV.

Porter, TM (1986).  The Rise of Statistical Thinking 1820-1900. Princeton University Press.

Official Sound Track

Beatles (1969).  Come Together.  In: Abbey Road, Track 4.


No comments:

Post a Comment