Preamble
I haven't posted in over two years, and that's largely because I hadn't anything further to say, and that fact is attributable that I spend most of my time these days doing other things. But, I still get dragged into it sometimes. In particular, I've been participating in a World Health Organization workgroup for the last year and a half that was intended to come up with codifying standard practice for benchmark dose modeling. That involved phone conferences and critiquing text written by other members. Things weren't really progressing towards any sort of a conclusion, so a meeting was convened in Geneva last month that also brought in a number of other participants.
As I pretty knew already, among the initial workgroup members, I was in the minority in at least two respects. First, while I'm a pharmacologist/toxicologist by training, the other members are largely statisticians. Secondly, while I am very interested in dose-response modeling, I am actually not all that keen at picking a point on the curve to be the "Benchmark Dose". Even though doing so might be useful sometimes, it never seemed to help with the problems I worked on at the FDA.
I wrote a short one page essay the morning after I got back. It was ostensibly written for inclusion somewhere in WHO document that is to be the end product of the meeting. However, I don't know if or when that will happen. So, I'll share it here just case it never goes anywhere else.
Dose-Response Modeling and Weight of the Evidence
Causal relationships can be expressed mathematically, and the expression of acceleration attributable to gravity is perhaps the most well-known example. Dose-response models are quantitative expressions of causal relationships in pharmacology and toxicology. However, even when it is expressed mathematically, the validity of the expression of causality ultimately depends upon a judgment that is not itself mathematical (Illari and Russo, 2015). In the fields of medicine and physiology, perhaps the best of evidence of that comes from the fact that when Hill (1965) gave his widely known lecture on causality before a group of statisticians, he used no mathematical equations whatsoever.
Weight of the evidence approaches have been used for dose-response modeling both at JECFA (e.g. for lead, WHO 2000 and WHO 2011) and elsewhere (e.g. Morgan and Granger, 1980; Evans et al, 1994, and Carrington et al, 2011). Using weight of the evidence to address dose-response model uncertainties is largely the same as when Bayesian methods are used. There is still a need to identify a finite set of alternative models or hypotheses, and the models are still either directly fit to data or designed to be consistent with the empirical record. Furthermore, both approaches utilize expert opinion, and at the end of the process probabilities are assigned to each alternative model so that they all add up to 1. However, there are important differences.
- First, Bayesian methodology uses expert opinion prior to curve-fitting, and then “updates” the probabilities initially assigned by the experts as part of the curve-fitting process to yield the final model probabilities. On the other hand, a weight of the evidence approach does not assign model probabilities until after curve-fitting has taken place; experts may use information about how well each model describes the data, but also use other theoretical and experiential criteria as well. Because the Bayesian approach alters expert option after it is expressed, it has the potential of yielding final model probabilities that contradict what experts believe.
- Second, because it is amenable to automation, Bayesian methodology is far more reproducible than a methodology which depends solely on expert opinion. Model probabilities assigned by experts may vary among experts or even a single expert over time. That fact perhaps makes the Bayesian methodology preferable when a standardized approach is desirable and there is no strongly held expert opinion.
- Third, because it is thought of as a mathematical exercise, calculating Bayesian probabilities requires the use of models for which log-likelihood functions can be calculated. For more complex biological models, that may not be possible. Under those circumstances, consulting expert opinion is really the only option.
Although they have been used for other purposes (e.g. Suter and Cormier, 2011) a formal process of the same ilk as the Hill criteria (Hill, 1965) is not typically implemented for quantitative dose-response modeling. A process for weighing evidence could temper differences of opinion among experts regarding dose-response model form without eliminating expert opinion altogether.
Assigning probabilities by committee would also help. In place of the “associations” that concerned Hill, one or more numerical goodness fit measures could be used to argue for or against specific models. The other Hill criterion that is directly relevant to dose-response modeling is the requirement for a “biological gradient”. Quite simply, a dose-response model ought to look like what a dose-response relationship is supposed to look like. That criterion could perhaps be subdivided into theoretical and experiential components. As an instance of the former, an argument that a dose-response relationship cannot be supralinear as the dose approaches zero can be based on the notion that it violates the generally accepted biochemical law of mass action (Tallarida and Jacob, 1976). An experiential argument would reflect the experience of toxicologists with other analogous dose-response relationships.
References
Carrington CD, Murray C, and Tao, S. (2013).
A Quantitative Assessment of Inorganic Arsenic in Apple Juice.
Evans, J. S., Graham, J. D., Gray, G. M. and Sielken, R. L. (1994), A Distributional Approach to Characterizing Low‐Dose Cancer Risk.
Risk Analysis, 14: 25-34.
Hill, Sir Arthur Bradford (1965).
The Environment and Disease: Association or Causation? Proc Royal Soc Med 58:295-300.
Illari P and Russo F (2015). Chapter 6: Evidence and Causality. In:
Causality: Philosophical Theory Meets Scientific Practice. Oxford University Press, Oxford, pp. 46-59.
Morgan, M. G., Morris, S. C., Henrion, M. , Amaral, D. A. and Rish, W. R. (1984), Technical Uncertainty in Quantitative Policy Analysis — A Sulfur Air Pollution Example.
Risk Analysis, 4: 201-216
Suter, GW and Cormier SM (2011). Why and how to combine evidence in environmental assessments: Weighing evidence and building cases.
Science of the Total Environment 409:1406–1417.
Tallarida RJ and Jacob LS (1979). Chapter 3: Kinetics of Drug-Receptor Interaction: Interpreting Dose-Response Data. In:
The Dose-Response Relation in Pharmacology. Springer-Verlag, New York, pp. 49-84
WHO, 2000. Lead,
53rd JECFA
WHO, 2011. Lead,
73rd JECFA
Post Note
I do believe that Bayesian Model Averaging will work reasonably well for the evaluations where the risk is fairly trivial. However, for more serious purposes (e.g. arsenic, lead, etc) it is a step in the right direction, at best. But you just can't throw expert opinion, common sense, and associative learning into the dustbin of history by burying it in a prior. It's silly, and sometimes obviously so.