The Perils of Multivariate Linear Regression
As is often the case in the epidemiological literature on
environmental influences on neurobehavioral development, Bowers and Beck (2006)
noted that a paper by Lanphear et al (2005) “has suggested the existence of a
supra-linear dose–response relationship between environmental measures such as
blood lead concentrations and IQ”. They
then produced an analysis that indicates that the apparent supralinearity is an
artifact resulting from the way the data were analyzed. They stated their conclusion as follows:
Results
of the analyses show that a supra-linear slope is a required outcome of
correlations between data distributions where one is lognormally distributed
and the other is normally distributed.
While
their mathematical analysis was indubitably correct, the way Bowers and Beck
reported the results left something to be desired. How the data are distributed is not really the issue at all. Instead, the
mathematical artifact they found results from conducting linear regression
analyses with log transformed data. If
data from a normal distribution, or any other distribution, were log
transformed prior to the regression analysis, then the same result would be
obtained. Furthermore, as demonstrated
by Jusko et al (2006), a linear regression without log transformation with data
drawn from a lognormal distribution does not result in a supralinear
dose-response relationship.
To their credit, Hortung et al (2006) also understood that
the real issue is the shape of the dose response relationship rather than the
distributions that either the dependent or independent variables follow. They therefore protested that Lanphear et al
(2005) had considered the likely shape of the curve before conducting the
regression analysis:
The shape of the exposure–response
relationship was determined to be nonlinear insofar as the quadratic and cubic
terms for concurrent blood lead were statistically significant (p < 0.001
and p = and 0.003, respectively). Because
the restrictive cubic spline indicated that a log-linear model provided a good
fit to the data, we used the log of concurrent blood lead in all subsequent
analyses of the pooled data.
But, there are many problems with this justification. First, it is not at all clear how a spline
analysis specifically supports a log-linear model, as opposed to other
potential nonlinear models (e.g. a Hill function). Second, there was no consideration of
biological plausibility. Like Bowers
and Beck (2006), Lanphear et al (2005) seem to think establishing a causal
relationship is a mathematical problem rather than a biological one. Third, they did not consider the possibility
that other covariates might explain the apparent nonlinearity. Yet, off they went, and a dose-response
model that predicts infinite large effects as the dose approaches zero was the
inevitable result. For all practical purposes, Bowers and Beck (2006) were entirely correct.
Besides the fact that a loglinear dose-response model is
a very poor theory, there is a more
general lesson to be learned: A
multivariate regression with assumed quantitative relationships between the
variables being modeled is highly prone to error. While a loglinear relationship is obviously
wrong, a linear relationship isn’t necessarily right either. Correlations between variables may result in
attribution of mismodeled causal effects to a variable that has no causal
effect at all. For example, if the
relationship between socioeconomic status (i.e. the HOME score) and IQ is
nonlinear with bigger impacts with low scores and negatively correlated with
exposure to an environmental chemical, the some of the socioeconomic effect
will erroneously appear to be a low dose effect attributable to the
environmental chemical. There are many other possible explanations as well, all of whicih are more probable than a dose response model than predicts incremental effects to get bigger as the dose gets smaller.
Process vs. Theory
Biological complexity often makes the pronunciation of
definitive truths doe Medicine and Public Health practically impossible. While relying on expert opinion is a common
solution to that problem, that solution does not work well when opinion is
divided. As a means of coping with that
problem, institutional decision making processes often employ structured
evaluation systems to sort through what can often be a voluminous set of
scientific literature. The Safety Assessment methodology that is typically used for premarket approval evaluations
is an example. This description of
Evidence-Based Medicine conveys the general ethos of such efforts:
Whether applied to medical
education, decisions about individuals, guidelines and policies applied to
populations, or administration of health services in general, evidence-based
medicine advocates that to the greatest extent possible, decisions and policies
should be based on evidence, not just the beliefs of practitioners, experts, or
administrators. It thus tries to assure that a clinician's opinion, which may
be limited by knowledge gaps or biases, is supplemented with all available
knowledge from the scientific literature so that best practice can be
determined and applied. It promotes the use of formal, explicit methods to
analyze evidence and makes it available to decision makers.
There are two key concepts at work here. First, the “beliefs of practitioners,
experts, or administrators” are getting kicked to the curb in favor of
“evidence”. If you thought the beliefs
of experts were based on scientific evidence, then you were misinformed,
apparently. Secondly, there is an emphasis
on the “use of formal, explicit methods”, which also serve to limit subjective
influences on the evaluation process.
Experts are not always trustworthy, so the desire for a
transparent process is entirely understandable.
But, getting a trustworthy process to replace the experts is easier said than done. The process has to be designed by somebody,
and that usually means experts. There is
also apt to be a negotiation process involved in getting the process to be
accepted, so subjectivity isn’t really completely avoided. But perhaps the bigger problem is that trying to deal
with complex biological issues with a formula may often be rather stupid. If all the studies show the same result, then
it really isn’t going to matter whether the decision making process is
expert-based or evidence-based. If the
results are different, then the systematic review may succeed at identifying
the higher quality studies and grading the general result. But, it won’t explain why the results are
different. It won’t figure out why a
treatment may works sometimes, but not others.
That will take biological theories, and like the biases the
evidence-based systems strive to avoid, those are subjective. There are likely to be different theories,
of course, and then the experts will inevitably get into a debate over which
are more likely. But, guess what, that’s
the way science works: Trying to eliminate all potential bias with formulae
will also eliminate scientific progress.
By all means, more transparency is needed. In particular, let’s not trust authors to have
the last word on how the data they have collected are analyzed and
published. Medical researchers and
epidemiologists are notorious for not sharing original data involving human
subjects, even when they are legally required to do so (Panhuis etal, 2014;
Longo and Drazen, 2016). That will allow
better theories to flourish, and poor theories to flounder.
References
Bowers TS and Beck BD (2006). What is the meaning of non-linear
dose-response relationships between blood lead concentrations and IQ? Neurotoxicology 27:520-4.
Hornung R, Lanphear B, Dietrich K. (2006). Response to: “What is the meaning of
non-linear dose–response relationships between blood lead concentrations and
IQ?”. Neurotoxicology
27:635
Jusko TA, Lockhart DW, Sampson PD, Henderson CR Jr., and Canfield
RL (2006). Response to: “What is the
meaning of non-linear dose–response relationships between blood lead
concentrations and IQ?”. Neurotoxicology 27:1123–1125.
Lanphear BP, Hornung R, Khoury J, Yolton K, Baghurst P, Bellinger
DC, Canfield RL , Dietrich KN, Bornschein R, Greene T, Rothenberg SJ,8,
Needleman HL, Schnaas L, Wasserman G, Graziano J,13 and Roberts R. (2005). Low-Level Environmental Lead Exposure and
Children’s Intellectual Function: An International Pooled Analysis. Environ Health
Perspect. 113: 894–899.
Panhuis WG van, Paul P, Emerson C, Grefenstette J, Wilder R,
Herbst AJ, Heymann D, and Burke DS (2014).
A systematic review of barriers to data sharing in public health. BMC
Public Health 14:1144.
Official Post Soundtrack
Jackson, J (1980).
Biology. In: Beat Crazy, Track 9.
Post Notes
Thesis Post #65. This covers some of the same ground as Toxicology Meets Epidemiology, but with a more philosophical overview.