Tuesday, March 31, 2026

To Bayes or Not to Bayes, That Is the Question


The Responsibility Gap

There is no reason to assign probabilities to competing theories unless there is an actual decision that must be made.  Pascal knew that, and that’s one of the things you would learn from chapter 6 (Hacking, 1975).  It obviously can’t be a flawless process; otherwise the correct theory would be assigned a probability of 1 every time.

So, I figure the next best thing is to have probability assignments that are scientifically defensible.  At least that’s what I tried to do when working for a regulatory agency as a scientific advisor because a) I figured it was my job, b) I liked the job, and c) I was allowed to do it.  I always thought that would start a conversation about what the probability assignments should be.  But that didn’t happen for several related reasons:

1) It may be easier to not make a decision at all.  It’s sort of an FDA tradition to declare emergency and then at a later date declare victory without doing anything at all.
2) If a decision absolutely must be made then it is much easier to with a formula, e.g. safety/uncertainty factors that doesn’t need to scientifically defensible.
3) If a risk estimate is necessary then it’s much easier to use a default assumption (linear extrapolation from high doses

All of those techniques insulate the expert from the decision, which obviates the need to assign probabilities to alternative hypotheses.  

Bayesian metholodogy also insulates experts from the decision, but to a lesser degree.  Since it does lay out the competing theories that underly a decision, I think it is far preferable to the backroom methodologies outlined above.  But perhaps the best about Bayesian methodology is that the practitioners actually want the job.

Help Wanted

I started assigning probabilities to alternative hypotheses because I figured it was my job and for many years I was allowed to do it.  I was also allow to participate in the writing of my own job description and I always made sure it said I was supposed to “convey uncertainty about potential health effects arising from contaminants  in food to decision makers”.  But that didn’t last; after a reorganization my job description changed to something like “support a decision that has already been made”.   That’s when I pretty much figured I’d rather write a blog than work at the USFDA.

I don’t really want my old job back.  I’m too old for that.  But I still figure someone else should have it.  There’s been another reorganization, but there is still a contaminants branch facing the same old problems.  Seems like a lot of people in the EPA and WHO should know their way around a probability tree as well.  

But it isn’t really just a government issue; it’s primarily a science problem.  The Bayesians shouldn’t need to identify plausible theories after a study has already been conducted.  That should have happened before the effort to design experiments and/or collect observational data began.  But of course, it is entirely possible that all of the hypotheses considered at the outset of a study are disproven by the new data.  That makes it time for a new theory.  Neither a frequentist or Bayesian analysis can help with that.  But a probability tree can.

Model Shopping

Perhaps the scientists who need probability trees the most are epidemiologists, especially the ones doing multivariate analysis with multiple putative causal influences.  I’ve been over some of it before from a historical perspective (Neyman was right, but Fisher sold more textbooks); null hypothesis testing doesn’t necessarily test the hypotheses that really matter.  That can easily set up a model shopping exercise that is solely interested in generating statistical significance, possibly by using a model that isn’t plausible in the first place.  I’ll also add the general point made in my last post that trying to turn hypothesis testing into a statistical exercise that treats observations as instances in stochastic probability theory rather than evidence for or against a theory even when the underlying theory isn’t stochastic at all. 

Statistical significance testing isn’t crazy when the number of alternative theories is exactly two, and the number of observations is small.  But otherwise, it’s nuts.  Model shopping isn’t such a bad idea when you are shopping for plausible theories.  However, it needs to happen as part of an open discussion that even someone working for the government can take part in.  That means the set recorded observations  used in published studies needs to be shared.  Furthermore, the search for more plausible theories doesn't stop just because a paper has already been published.  

Reference

Hacking, I (1975).  The Great Decision.  In: The Emergence of Probability.  Cambridge University Press, pp. 63-72.

Official Sound Track

Talking Heads (1977).  Don't Worry About the Government.  In: Talking Heads: 77, Track 8.


Tuesday, March 10, 2026

An Ode to Regression Analysis

Hello Again

It's been a while.  But the bug to write yet another essay has bitten me and I don't know what to do with it besides putting here.  It more or less started with having a wiki article on the History of Probability called to my attention.  I was gratified to see that it opened by acknowledging the duality of probability that I figure is a matter of psychological fact.  But then, as usual in my experience, the rest of the article proceeded to focus on stochastical probability, aka frequency of occurrence.  
 
Most annoying to me is that even though Ian Hacking’s The Emergence of Probability was referenced three times, the chapter (8) that discussed Pascal’s Wager wasn’t mentioned at all.  That’s where Hacking discussed the ins and outs of using probability trees (aka probability logic) to represent the evidential status of competing theories.

I get it; frequentist probability is much more amenable to a mathematical treatment than the evidential ancillary could ever hope to be; and that’s what the article is really about.  Hume (1739) and Hill (1965) resorted to subjective rules of evidence with no mathematics whatsoever rather than a deductive process where the premises automatically dictate the conclusions.  That doesn’t fit into a history of mathematical probability.

But I've been over all that before on this blog.  What's got me going again is that I've come up with another angle.  Quantifying stochastic probability has been done, but quantifying evidence is another thing altogether.

An Important Caution

Devising a mathematical methodology for assigning a numerical probability to competing theories is rather obviously not always possible.  For example, Pascal’s wager was on “God Is” vs “God Is Not”.  Deciding who the killer is in a murder mystery doesn't involve mathematics either even when it is quantified beyond a reasonable doubt.  Furthermore, mathematics is a deductive process; the conclusions must follow from the premises.  On the other hand, weighing evidence or generating hypotheses in the first place is inductive.  Or so they say, because it is also usually conceded that weighing evidence involves subjective judgment.  But when the theories themselves are mathematical then perhaps something useful can be done to make the evaluation not quite so subjective.  

Regression Analysis

As it turns out, if you ascribe discrepancies between a quantitative model and a set of observations to measurement error with a stochastic distribution then you can turn the estimation of model parameter values into a statistical problem.  It’s neat trick.  Sure, you can use a ruler to draw a straight line through scattered data, but different line drawers may end up with somewhat different slopes.  But with least squares linear regression you get the same result every time. 

Linear regression can performed with relatively straightforward mathematics.  But the model has to be linear.  However, with the aid of a computer using trial and error methodology where parameters are adjusted up and down to find if the fit improved or not, least squares regression can applied to any model.  Furthermore, it doesn’t even have to be least squares; any other methodology that weights the relative importance of discrepancies between model and observations may be used instead.  In particular, weighting unsquared residuals places less weight on large deviations than squared residuals do.

However, the underlying rationale for regression analysis is not entirely justified.  First, even if the discrepancies between model and observation are a result of measurement error the actual distribution is usually unknown.  Second, the set of observations may not be entirely representative of the actual distribution.  Third, the model may be not entirely correct.  That is especially likely with multivariate analyses where mismodeling one quantitative relationship can end up with misestimation of the other model parameters as well.  At that point it may be time to consider a new hypothesis.

So calling regression analysis “statistical” or even mathematical is a big stretch.  But I still think it’s very useful because it is using data as evidence for models and theories.  In fact, it can be thought of as quantitative induction.  That is good, very good in fact.  Furthermore, it seems clear that regression analysis has a role to play in filling out probability trees for competing theories with numerical probability assignments that sum to one.

Quantifying the Probability of Competing Hypotheses

The Bayesian Strategy

Employers of Bayes Theorem definitely understand that probability is not the same as frequency of occurrence and they are also comfortable with assigning probabilities to competing models.  Known as Bayesian inference, this is accomplished by assimilating the alternative models into a supermodel and then performing a regression analysis that assigns greater probabilities to the model(s) that fit the best.  

A Beyesian analysis can  also let subjective expert opinion be part of the process, but there’s a catch; the contributions from the experts comes before the regression analysis.  That suffers from the same general problem of trying to make grading evidence a deductive exercise; it’s just not consistent with the way science works.  After all, the issue underlying hypotheses lies in evaluating if they are true rather than how often they are true.   

The Pearson Strategy

There are two sorts of correlation coefficients, aka r-values.  The first measures the association between two different measurements of sets of observations  (e.g. genetics and the occurrence of a disease).  But a Pearson correlation coefficient can also be used to measure the relationship between the values predicted by a model and those observed, and it’s generated by linear regression.  You can easily produce something analogous to the Pearson r value for any regression methodology. The Bayes factor is also functionally equivalent to an r value.

Pearson himself thought the r value was useful for grading the strength of an inference (Porter, 1986).  It plugs in nicely to the first three Hill criteria, namely strength, consistency, and specificity.  It’s not to hard to argue that a model or theory with a higher r value deserves a higher probability assignment.  You could even devise an algorithm or equation that at least somewhat fairly directs the relationship.  Yes, it would be somewhat arbitrary, but I’ll take it all day over safety factors or default assumptions.

In Summary

There are two approaches for combining data and expert opinion.  The Bayesian approach starts with expert opinion and then uses data to produce final evidential judgments.  The Pearson-Hill approach produces a measure of how well the data fits each hypothesis, but leaves the final evidential judgment to experts.  I'll discuss pros and cons next.

References

Hacking, I (1975).  The Great Decision.  In: The Emergence of Probability.  Cambridge University Press, pp. 63-72.

Hill, AB (1965).  "The Environment and Disease: Association or Causation?". Proceedings of the Royal Society of Medicine. 58 (5): 295–300.

Hume, D (1739).  A Treatise on Human Nature.  Book I, Section XV.

Porter, TM (1986).  The Rise of Statistical Thinking 1820-1900. Princeton University Press.

Official Sound Track

Beatles (1969).  Come Together.  In: Abbey Road, Track 4.


Thursday, August 12, 2021

The Mantra

 I first heard it early in my career in my FDA career at an EPA symposium on manganese.  It went something like this:

“Where health is concerned, money doesn’t matter”

What an utterly stupid thing to say, I thought.  But to my consternation, many voices around the room followed with a “hear, hear”.  I have heard the mantra many times since, and I think it also lies unspoken behind many public policy decisions.  I have come to realize that it isn’t quite so stupid when uttered by people who are in the business of providing health benefits.  What they are really saying is:

“Where health is concerned, give us all your money”

Health Care

While I worked at the Center for Food Safety and Applied Nutrition, I always thought the mantra was uniquely associated with toxicological issues in food safety and environmental regulation.  But I’ve been out of the business for over six years now, and I now realize that the Mantra has a much wider presence.  In particular, the phrase “access to health care” sounds suspiciously like an alternative version of the Mantra.   What is “access” supposed to mean?  It obviously doesn’t mean everyone will be entitled to any and all medical procedures regardless of cost.  I think what it really means is:

“Give us all your money and we’ll decide what to do with it” 

But to my consternation, that seems to be exactly what we (in the US) are doing.  All the unfunded mandates (Obamacare, the hospital mandate, the employer mandate) all funnel money into the pockets of the medical industry with no consideration of how the money will be spent.  It’s why we spend twice as much money as any other country in the world and get less for it.  It’s socialized medicine run for profit; the worst of socialism and capitalism all rolled into one.   

However, Medicare is a different story.  Since the government must work with a limited budget, they are forced to be at least semi-rational about how the money is spent.  That’s why I think replacing the unfunded mandates with a fiscally conservative version (no we don’t need to spend more money on health care at public expense) of  Medicare for All is a mighty fine idea.  Makes no sense to have socialized medicine for poor people and old people while not giving it to the people who work and pay for it.   

COVID

But the poster child for the Mantra has to be COVID.  While I think Tony Fauci is a mighty fine scientist, that doesn’t mean he should be in charge of managing the economy.  I believe the initial reaction to the pandemic was overreaction driven by the Mantra.  Plus the aftermath reminds me of the fate of oyster beds after an oil spill; once the bureaucracy has taken control, it doesn’t want to let go until it has attained some arbitrary safety standard that never existed before.  

I do believe that people have generally gotten more rational about COVID, but there’s still a long way to go.  There are money, freedom and health tradeoffs with each and every mitigation technique.  We need to make the hard choices about which ones are really worth it.  I think vaccinations and masking are generally worth it, even if a mandate is required.  Since it became apparent early on that asymptomatic people can transmit the virus, haven’t thought contact tracing could work for a long time.  I don’t quarantining is especially worthwhile either.  I wonder if it’s wise to let hospitals fill up with COVID patients when they may have more pressing business to attend to.   Which brings us to public gatherings, especially schools.  

Looks like we are going to have to experiment a bit; let’s not hum the Mantra.

Official Post Soundtrack

Killing Joke (2005).  "Medicine Wheel."  In: Democracy, Track 8.


Tuesday, April 23, 2019

A Minority Opinion

Preamble

I haven't posted in over two years, and that's largely because I hadn't anything further to say, and that fact is attributable that I spend most of my time these days doing other things.  But, I still get dragged into it sometimes.  In particular, I've been participating in a World Health Organization workgroup for the last year and a half that was intended to come up with codifying standard practice for benchmark dose modeling.  That involved phone conferences and critiquing text written by other members.  Things weren't really progressing towards any sort of a conclusion, so a meeting was convened in Geneva last month that also brought in a number of other participants.

As I pretty knew already, among the initial workgroup members, I was in the minority in at least two respects.  First, while I'm a pharmacologist/toxicologist by training, the other members are largely statisticians.  Secondly, while I am very interested in dose-response modeling, I am actually not all that keen at picking a point on the curve to be the "Benchmark Dose".  Even though doing so might be useful sometimes, it never seemed to help with the problems I worked on at the FDA.

I wrote a short one page essay the morning after I got back. It was ostensibly written for inclusion somewhere in WHO document that is to be the end product of the meeting.  However, I don't know if or when that will happen.  So, I'll share it here just case it never goes anywhere else.

Dose-Response Modeling and Weight of the Evidence

Causal relationships can be expressed mathematically, and the expression of acceleration attributable to gravity is perhaps the most well-known example.   Dose-response models are quantitative expressions of causal relationships in pharmacology and toxicology.  However, even when it is expressed mathematically, the validity of the expression of causality ultimately depends upon a judgment that is not itself mathematical (Illari and Russo, 2015).  In the fields of medicine and physiology, perhaps the best of evidence of that comes from the fact that when Hill (1965) gave his widely known lecture on causality before a group of statisticians, he used no mathematical equations whatsoever.

Weight of the evidence approaches have been used for dose-response modeling both at JECFA (e.g. for lead, WHO 2000 and WHO 2011) and elsewhere (e.g. Morgan and Granger, 1980; Evans et al, 1994, and Carrington et al, 2011).  Using weight of the evidence to address dose-response model uncertainties is largely the same as when Bayesian methods are used.  There is still a need to identify a finite set of alternative models or hypotheses, and the models are still either directly fit to data or designed to be consistent with the empirical record.  Furthermore, both approaches utilize expert opinion, and at the end of the process probabilities are assigned to each alternative model so that they all add up to 1.  However, there are important differences.

  • First, Bayesian methodology uses expert opinion prior to curve-fitting, and then “updates” the probabilities initially assigned by the experts as part of the curve-fitting process to yield the final model probabilities.  On the other hand, a weight of the evidence approach does not assign model probabilities until after curve-fitting has taken place; experts may use information about how well each model describes the data, but also use other theoretical and experiential criteria as well.  Because the Bayesian approach alters expert option after it is expressed, it has the potential of yielding final model probabilities that contradict what experts believe.
  • Second, because it is amenable to automation, Bayesian methodology is far more reproducible than a methodology which depends solely on expert opinion.  Model probabilities assigned by experts may vary among experts or even a single expert over time.   That fact perhaps makes the Bayesian methodology preferable when a standardized approach is desirable and there is no strongly held expert opinion.
  • Third, because it is thought of as a mathematical exercise, calculating Bayesian probabilities requires the use of models for which log-likelihood functions can be calculated.  For more complex biological models, that may not be possible.  Under those circumstances, consulting expert opinion is really the only option.

Although they have been used for other purposes (e.g. Suter and Cormier, 2011) a formal process of the same ilk as the Hill criteria (Hill, 1965) is not typically implemented for quantitative dose-response modeling.  A process for weighing evidence could temper differences of opinion among experts regarding dose-response model form without eliminating expert opinion altogether. 

Assigning probabilities by committee would also help.   In place of the “associations” that concerned Hill, one or more numerical goodness fit measures could be used to argue for or against specific models.  The other Hill criterion that is directly relevant to dose-response modeling is the requirement for a “biological gradient”.  Quite simply, a dose-response model ought to look like what a dose-response relationship is supposed to look like.  That criterion could perhaps be subdivided into theoretical and experiential components.  As an instance of the former, an argument that a dose-response relationship cannot be supralinear as the dose approaches zero can be based on the notion that it violates the generally accepted biochemical law of mass action (Tallarida and Jacob, 1976).  An experiential argument would reflect the experience of toxicologists with other analogous dose-response relationships. 

References

Carrington CD, Murray C, and Tao, S. (2013). A Quantitative Assessment of Inorganic Arsenic in Apple Juice.  

Evans, J. S., Graham, J. D., Gray, G. M. and Sielken, R. L. (1994), A Distributional Approach to Characterizing Low‐Dose Cancer Risk. Risk Analysis, 14: 25-34.

Hill, Sir Arthur Bradford (1965).  The Environment and Disease: Association or Causation?  Proc Royal Soc Med 58:295-300.

Illari P and Russo F (2015).  Chapter 6: Evidence and Causality.  In: Causality: Philosophical Theory Meets Scientific Practice.  Oxford University Press, Oxford, pp. 46-59.

Morgan, M. G., Morris, S. C., Henrion, M. , Amaral, D. A. and Rish, W. R. (1984), Technical Uncertainty in Quantitative Policy Analysis — A Sulfur Air Pollution Example. Risk Analysis, 4: 201-216

Suter, GW and Cormier SM (2011).  Why and how to combine evidence in environmental assessments: Weighing evidence and building cases.  Science of the Total Environment 409:1406–1417.
Tallarida RJ and Jacob LS (1979).  Chapter 3: Kinetics of Drug-Receptor Interaction: Interpreting Dose-Response Data.  In: The Dose-Response Relation in Pharmacology.  Springer-Verlag, New York, pp. 49-84

WHO, 2000.  Lead, 53rd JECFA

WHO, 2011.  Lead, 73rd JECFA

Post Note

I do believe that Bayesian Model Averaging will work reasonably well for the evaluations where the risk is fairly trivial.  However, for more serious purposes (e.g. arsenic, lead, etc) it is a step in the right direction, at best.  But you just can't throw expert opinion, common sense, and associative learning into the dustbin of history by burying it in a prior.  It's silly, and sometimes obviously so.

Sunday, August 14, 2016

Individual Fish Risk Benefit Model

This page is set up as an adjunct to the discussion in The Science-Policy Shell Game concerning fish consumption advice.  I may replace the Excel macro linked here now with something prettier and/or easier to use at some point in the future.

Software

This Excel macro, Personal_Seafood_Net_Effect_Estimator.xlsintegrates four components presented earlier:




Sunday, July 10, 2016

SPSG #13: Ending the Game

This chapter outlines strategies for overcoming the difficulties noted in earlier chapters. First, adopting some common legal strategies for separating Matters of Fact from Matters of Law could do wonders. The creation of job positions for Science Judges whose sole responsibility is to disentangle science matters from policy matters could facilitate that. Second, while eliminating the science-policy shell game entirely is probably not possible, there is no reason why it should be condoned or institutionalized. Therefore, the EPA assessment guidelines for cancer and noncancer endpoints both need to be rewritten. As the face of the Safety Assessment Paradigm, the Reference Dose background document should be rewritten to make it clear that the product of the assessment is a regulatory policy rather than a statement of scientific fact. As for the cancer risk assessment guidelines, instead of enshrining the default option with the Point of Departure, the guidelines should use probability trees to make the default option going away entirely. Furthermore, there is no reason to the restrict the use of quantitative risk assessment to just cancer. However, solving that problem will create another problem: At least in public health, a legacy of the institutionalized use of the science-policy shell game has virtually eliminated risk management as a federal job position. So, doing a risk assessment is of very little use if no one in the federal government has the responsibility for managing issues. That can happen, but position descriptions will need to be rewritten. Finally, research should be funded to support science, instead of supporting technocratic shell games. In particular, the enterprise of environmental epidemiology needs to be redesigned. Statistical significance testing should be eliminated as the primary means of drawing conclusions from data, and studies should be designed to increase or reduce evidentiary weight accorded to causal theories instead. Perhaps most importantly, observational data needs to be shared. When it comes to analyzing data, regardless of what their source of funding is, investigators cannot be given complete deference in conveying what the data infer. As an academic recommendation, teaching Statistics and Probability as separate subjects would clear up more than a few nagging philosophical problems.  Finally, the facade of impersonal scientific objectivity needs to be abandoned. Scientists should be both free to speculate and humble enough to admit their theories may be wrong.

SPSG #12: Personal Technique

This chapter is largely written in the first person, and it does so largely for the purpose of disparaging the concept of scientific objectivity. It starts out by describing how several reorganizations dramatically affected the branch at the USFDA that I worked in for 25 years. In the end, it was swallowed up by the shell game. It then goes on to discuss the importance of recognizing the subjective nature of science, particularly when the science is unsettled and uncertain. The objectivity facade is partly attributable to scientific writing style that takes the author out of discussions of factual issues, which hides tha fact that personal opnions are beign expressed.   To demonstrate what science is really like, I walk through the personal choices I made in developing the dose-response model for arsenic and lung cancer that was used for the apple juice and rice risk assessments discussed in Chapter 11. Some of those choices were done by committee and some were not, but either way they all involved subjective scientific judgements made in a fog of uncertainty. In one case, I made a different choice that I had previously because new information influenced my subjective judgment about how to go about estimating lifetime risks from a prospective epidemiological study, which underscores the notion that “objective” reality evolves with scientific inquiry. The resulting dose-response model is then used to provide risk estimates for someone with a high-end (for the United States) arsenic intake. In addition to providing the lifetime risk estimates that were also given in the FDA reports on apple juice and rice, estimated changes in average life expectancy are also provided. For the purpose of making an individual choice, the latter measure is far more meaningful. The chapter then suggests that the inability of EPA to provide a dose-response characterization for arsenic may stem from a wrongheaded demand for objectivity that dictates the use of the wrong probability and the wrong personnel for a job that needs statistical theory instead of statistical probability.