Friday, March 27, 2015

Where Is the Data?

The Recorded Observation

In a very basic sense, the practice of science involves arguing what is likely to happen in the future given what has happened in the past.  Based on their own personal past experience, everyone does this.  What makes a scientific discipline special is that there is a shared history of observation that allows and demands consideration of non-personal experience.  Therefore, most scientific investigations begin with the generation of record of observation, which is commonly referred to as “data”.  Since the entire discipline in some way depends on having accurate records, the credibility of a scientist heavily depends on correctly recording an observed event.  When a question of causality is involved, correctly describing the events that preceded the observation is very important as well.  In laboratory experiments and clinical trials, preceding events are deliberately manipulated in order to observe what follows.  In epidemiology, the order of events are simply observed without controlling them.

Sharing Data

Obviously, there is more to science than simply generating data.  Drawing conclusions from the data that allow predictions to be made are what give science its power.   But, before launching into an analysis, most scientific papers start by showing the data that is being analyzed in either tabular or graphical form.  It is also quite common to compare the results of the analysis to data by showing both observed and predicted values in the same table or graph.  There are two reasons for doing this.  First, it allows the reader to make their own judgment about the quality of the analysis being presented.  Second, it may permit the data to be used by other authors as the basis for further analysis, possibly by combining observations from multiple experiments or studies.

Once a paper has been published, most laboratory scientists will share raw data if the descriptions provided in the paper do not provide sufficient detail.  Epidemiologists often will not do this.  Reasons frequently given for this are a) the data are in some way confidential, or b) the data are the proprietary property of the investigator.  The other obvious explanation is that very different conclusions could be drawn from the data, and therefore the data are withheld in order to protect questionable analysis.  Not everyone is happy with this.  In fact, a federal law was enacted that requires investigators to share data when studies are funded by the federal government.  But still, you may have to go court to get it.  That’s no fun.

Adjusted Data

However, many epidemiology do at least show summary results of raw data, which is good.  It is also quite common to show “adjusted data”, which are estimated values that are intended to represent what would be observed without the presence of other causal influences.  While this isn’t necessarily a bad idea, it is important to note that adjusted values are not really recorded observations.  Their validity depends on the ability of the quantitative models used to make the adjustments to do so correctly.  Since most quantitative models in biology are approximately correct at best, it is a pretty sure bet that the adjusted values aren’t exactly right.    If the other causal influences are much more important than an environmental influence, then even a relatively minor flaw in the model used to make the correction can result in a large maladjustment of the variable of interest.  Maladjustments are also likely when there are many variables being adjusted for, and there are potential interactions between one or more of them. 

Inappropriate adjustments can either hide true effects or create the appearance of effects that aren’t really there.  A statistically significant result may arise because one or more of the models used to make the adjustments is “systematically” wrong.   While there is no sure fire way to prevent any of these results from happening, comparing unadjusted to adjusted data is advisable; if the two are very different or deviate with a quantitative trend, then there is cause for concern.  If only adjusted values are shown, then perhaps even more caution is advised.

Meta-analysis and Risk of Bias Bias

Given the fact that causal determinations are difficult when working with observational epidemiological data, the weakness of individual studies can be often be overcome by combining them.  If raw data is available, which is rare, the data can be pooled and analyzed as if it all came from a single study, perhaps with additional variables in the model to account for differences between study populations.  That not only allow better characterization of the dose-response relationship of the variables of interest (e.g. arsenic and lung cancer), it can also promote the development better models for the other causal influences as well (e.g. smoking). 

If actual data are no available, then the only alternative is to try to assimilate the results of published study results.  A Risk of Bias analysis is a formal weight-of-the-evidence evaluation that is limited to evaluating the extent to which a particular study adequately accounts for and reflects all potential causal influences (Stoup et al, 2000).  As a search for plausible explanations, this effort dovetails with weight-of-the-evidence (“Hill Criteria”) evaluations concerned with specific causal theories.  But here’s the thing: A regimented evaluation process may eliminate known sources of bias, but it cannot eliminate unknown sources.  In fact, it may unwittingly reinforce them.  A literature survey is an opportune time for some novel inductive synthesis; and a new theory may turn an old theory into a source of bias.  Inductive reasoning never stops, or at least it shouldn’t.  In the long run, the only cure for bias is sharing the recorded observations.

Reference

Stoup et al (2000).  Meta-analysis of Observational Studies in Epidemiology. A Proposal for Reporting.  JAMA, April 19, 2000—Vol 283, No. 15

Official Post Soundtrack

XTC (1982).  Senses Working Overtime.  In: English Settlement, Track 3.

Post Note

Thesis Post #21.  Fourth in the epidemiology series; follows Toxicology meets Epidemiology.  Also, the non thesis post Data Economics is also in the same vein.  The youtube video has Drums and Wires as the cover, but it's not on that album.  Oh well.

No comments:

Post a Comment