Wednesday, May 11, 2016

Biological Problems

The Perils of Multivariate Linear Regression

As is often the case in the epidemiological literature on environmental influences on neurobehavioral development, Bowers and Beck (2006) noted that a paper by Lanphear et al (2005) “has suggested the existence of a supra-linear dose–response relationship between environmental measures such as blood lead concentrations and IQ”.  They then produced an analysis that indicates that the apparent supralinearity is an artifact resulting from the way the data were analyzed.  They stated their conclusion as follows:
Results of the analyses show that a supra-linear slope is a required outcome of correlations between data distributions where one is lognormally distributed and the other is normally distributed. 
While their mathematical analysis was indubitably correct, the way Bowers and Beck reported the results left something to be desired.  How the data are distributed is not really the issue at all.  Instead, the mathematical artifact they found results from conducting linear regression analyses with log transformed data.  If data from a normal distribution, or any other distribution, were log transformed prior to the regression analysis, then the same result would be obtained.  Furthermore, as demonstrated by Jusko et al (2006), a linear regression without log transformation with data drawn from a lognormal distribution does not result in a supralinear dose-response relationship.

To their credit, Hortung et al (2006) also understood that the real issue is the shape of the dose response relationship rather than the distributions that either the dependent or independent variables follow.  They therefore protested that Lanphear et al (2005) had considered the likely shape of the curve before conducting the regression analysis:
The shape of the exposure–response relationship was determined to be nonlinear insofar as the quadratic and cubic terms for concurrent blood lead were statistically significant (p < 0.001 and p = and 0.003, respectively).  Because the restrictive cubic spline indicated that a log-linear model provided a good fit to the data, we used the log of concurrent blood lead in all subsequent analyses of the pooled data.
But, there are many problems with this justification.  First, it is not at all clear how a spline analysis specifically supports a log-linear model, as opposed to other potential nonlinear models (e.g. a Hill function).  Second, there was no consideration of biological plausibility.  Like Bowers and Beck (2006), Lanphear et al (2005) seem to think establishing a causal relationship is a mathematical problem rather than a biological one.  Third, they did not consider the possibility that other covariates might explain the apparent nonlinearity.   Yet, off they went, and a dose-response model that predicts infinite large effects as the dose approaches zero was the inevitable result.  For all practical purposes, Bowers and Beck (2006) were entirely correct.  

Besides the fact that a loglinear dose-response model is a very poor theory, there is a more general lesson to be learned:  A multivariate regression with assumed quantitative relationships between the variables being modeled is highly prone to error.  While a loglinear relationship is obviously wrong, a linear relationship isn’t necessarily right either.  Correlations between variables may result in attribution of mismodeled causal effects to a variable that has no causal effect at all.  For example, if the relationship between socioeconomic status (i.e. the HOME score) and IQ is nonlinear with bigger impacts with low scores and negatively correlated with exposure to an environmental chemical, the some of the socioeconomic effect will erroneously appear to be a low dose effect attributable to the environmental chemical.  There are many other possible explanations as well, all of whicih are more probable than a dose response model than predicts incremental effects to get bigger as the dose gets smaller.

Process vs. Theory

Biological complexity often makes the pronunciation of definitive truths doe Medicine and Public Health practically impossible.  While relying on expert opinion is a common solution to that problem, that solution does not work well when opinion is divided.  As a means of coping with that problem, institutional decision making processes often employ structured evaluation systems to sort through what can often be a voluminous set of scientific literature.  The Safety Assessment methodology that is typically used for premarket approval evaluations is an example.   This description of Evidence-Based Medicine conveys the general ethos of such efforts:
Whether applied to medical education, decisions about individuals, guidelines and policies applied to populations, or administration of health services in general, evidence-based medicine advocates that to the greatest extent possible, decisions and policies should be based on evidence, not just the beliefs of practitioners, experts, or administrators. It thus tries to assure that a clinician's opinion, which may be limited by knowledge gaps or biases, is supplemented with all available knowledge from the scientific literature so that best practice can be determined and applied. It promotes the use of formal, explicit methods to analyze evidence and makes it available to decision makers.
There are two key concepts at work here.  First, the “beliefs of practitioners, experts, or administrators” are getting kicked to the curb in favor of “evidence”.  If you thought the beliefs of experts were based on scientific evidence, then you were misinformed, apparently.  Secondly, there is an emphasis on the “use of formal, explicit methods”, which also serve to limit subjective influences on the evaluation process. 

Experts are not always trustworthy, so the desire for a transparent process is entirely understandable.  But, getting a trustworthy process to replace the experts is easier said than done.  The process has to be designed by somebody, and that usually means experts.  There is also apt to be a negotiation process involved in getting the process to be accepted, so subjectivity isn’t really completely avoided.   But perhaps the bigger problem is that trying to deal with complex biological issues with a formula may often be rather stupid.  If all the studies show the same result, then it really isn’t going to matter whether the decision making process is expert-based or evidence-based.  If the results are different, then the systematic review may succeed at identifying the higher quality studies and grading the general result.  But, it won’t explain why the results are different.  It won’t figure out why a treatment may works sometimes, but not others.  That will take biological theories, and like the biases the evidence-based systems strive to avoid, those are subjective.   There are likely to be different theories, of course, and then the experts will inevitably get into a debate over which are more likely.  But, guess what, that’s the way science works: Trying to eliminate all potential bias with formulae will also eliminate scientific progress.

By all means, more transparency is needed.  In particular, let’s not trust authors to have the last word on how the data they have collected are analyzed and published.  Medical researchers and epidemiologists are notorious for not sharing original data involving human subjects, even when they are legally required to do so (Panhuis etal, 2014; Longo and Drazen, 2016).  That will allow better theories to flourish, and poor theories to flounder.

References

Bowers TS and Beck BD (2006).  What is the meaning of non-linear dose-response relationships between blood lead concentrations and IQ?  Neurotoxicology 27:520-4.

Hornung R, Lanphear B, Dietrich K. (2006).  Response to: “What is the meaning of non-linear dose–response relationships between blood lead concentrations and IQ?”.  Neurotoxicology 27:635

Jusko TA, Lockhart DW, Sampson PD, Henderson CR Jr., and Canfield RL (2006).  Response to: “What is the meaning of non-linear dose–response relationships between blood lead concentrations and IQ?”.  Neurotoxicology 27:1123–1125.

Lanphear BP, Hornung R, Khoury J, Yolton K, Baghurst P, Bellinger DC, Canfield RL , Dietrich KN, Bornschein R, Greene T, Rothenberg SJ,8, Needleman HL, Schnaas L, Wasserman G, Graziano J,13 and  Roberts R. (2005).   Low-Level Environmental Lead Exposure and Children’s Intellectual Function: An International Pooled Analysis.  Environ Health Perspect. 113: 894–899.

Longo DL and Drazen JM (2016).  Data Sharing.  N Engl J Med 374:276-277.

Panhuis WG van, Paul P, Emerson C, Grefenstette J, Wilder R, Herbst AJ, Heymann D, and Burke DS (2014).  A systematic review of barriers to data sharing in public health.  BMC Public Health 14:1144.

Official Post Soundtrack


Jackson, J (1980).  Biology.  In: Beat Crazy, Track 9.

Post Notes

Thesis Post #65.  This covers some of the same ground as Toxicology Meets Epidemiology, but with a more philosophical overview. 

No comments:

Post a Comment