The Voice of Experience
In his seminal paper on the evaluation of epidemiological
data, Sir Austin Bradford Hill sought to share his experience in occupational
epidemiology with those working in the nascent field of environmental
epidemiology. In doing so, he laid out
the problem of detecting “relationships between injury, sickness, and
conditions of work” as follows:
“There are, of course, instances in
which we can reasonably answer these questions from the general body of medical
knowledge. A particular, and perhaps
extreme, physical environment cannot fail to be harmful; a particular chemical
is known to be toxic to man and therefore suspect on the factory floor. Sometimes, alternatively, we may be able to
consider what might a particular
environment do to man, and then see whether such consequences are indeed to be
found. But more often than not we have
no such guidance, no such means of proceeding; more often than not we are
dependent on our observation and enumeration of defined events for which we
then seek antecedents. In other words we
see that the event B is associated with environmental feature A,that, to take a
specific example, some form of respiratory illness is associated with dust in
the environment. In what circumstances
can we pass from this observed association
to a verdict of causation?”
The criteria laid out by Hill collectively provide a framework
for grading the evidence for a causal relationship. Briefly, these are as follows:
- Strength. This criterion expressly states that association is evidence for causation, it also suggests that the strength of the association evidence rises with the degree of association.
- Consistency. The criteria suggests that observation of the same association by different people at different times strengthens the evidence. Associated associations are to be preferred over those that are not.
- Specificity. Hill suggests that an association with one disease rather than many slightly strengthens the argument for causality.
- Temporality. Obviously, the cause must precede the effect. That said, the evidence is generally stronger when the putative effect occurs immediately.
- Biological Gradient. The evidence is strengthened if there is a quantitative link between the magnitude of the putative causal agent and the magnitude of the outcome. Or, in other words, a dose-response relationship is to be expected.
- Plausibility. While Hill stipulates that biological plausibility can strengthen an argument for causation, he also acknowledges that current knowledge may be inadequate or even mistaken. There also may be multiple plausible explanation for an association, only some of which entail causation.
- Coherence. A causal interpretation is weakened if contradicted by other known facts. However, the caveats concerning the limits of current knowledge also apply here.
- Experiment. By removing a putative cause, an intervention will test a causal hypotheses. This is also a form of evidence by association, but here the association, if there is one, is produced deliberately.
- Analogy. A causal argument is more acceptable when it bears a similarity to other causal arguments that have already been accepted.
Some of the Hill criteria are readily amenable to
quantification. In particular, odds
ratios and relative risks are often used to characterize strength of
association, and linear regression techniques may be used to demonstrate a
dose-response trend. However, even
though most of the theoretical criteria are also matters of degree, any formal
methodology used to measure evidential weight is apt to be as subjective as the
theories they are applied to. In
addition, even though Hill clearly thinks some of the criteria are more
important than others, he provides no guidance as to how all the criteria are
weighed together to provide an overall grading of the evidence. As a result, the enterprise of weighing
evidence for causality is inherently subjective, and is often done by committee
or workgroup.
The IARC Weight of the Evidence Categories
Since 1972, the most well known agency tasked with
conducting weight-of-the-evidence evaluations is the United Nations sponsored
International Agency for Research on Cancer (IARC). IARC sponsored workgroups have evaluated hundreds
of chemicals and substances with the goal of place each in one of four
evidential categories (from IARC, 2006):
- Sufficient evidence of carcinogenicity: The Working Group considers that a causal relationship has been established between exposure to the agent and human cancer. That is, a positive relationship has been observed between the exposure and cancer in studies in which chance, bias and confounding could be ruled out with reasonable confidence. A statement that there is sufficient evidence is followed by a separate sentence that identifies the target organ(s) or tissue(s) where an increased risk of cancer was observed in humans. Identification of a specific target organ or tissue does not preclude the possibility that the agent may cause cancer at other sites.
- Limited evidence of carcinogenicity: A positive association has been observed between exposure to the agent and cancer for which a causal interpretation is considered by the Working Group to be credible, but chance, bias or confounding could not be ruled out with reasonable confidence.
- Inadequate evidence of carcinogenicity: The available studies are of insufficient quality, consistency or statistical power to permit a conclusion regarding the presence or absence of a causal association between exposure and cancer, or no data on cancer in humans are available.
- Evidence suggesting lack of carcinogenicity: There are several adequate studies covering the full range of levels of exposure that humans are known to encounter, which are mutually consistent in not showing a positive association between exposure to the agent and any studied cancer at any observed level of exposure. The results from these studies alone or combined should have narrow confidence intervals with an upper limit close to the null value (e.g. a relative risk of 1.0). Bias and confounding should be ruled out with reasonable confidence, and the studies should have an adequate length of follow-up. A conclusion of evidence suggesting lack of carcinogenicity is inevitably limited to the cancer sites, conditions and levels of exposure, and length of observation covered by the available studies. In addition, the possibility of a very small risk at the levels of exposure studied can never be excluded.
Although placement into categories is clearly concerned with
causal interpretation of epidemiological data, IARC evaluations do not formally
use the Hill criteria. However, the
lines of reasoning used in IARC monographs are found on Hill’s list. In particular, strength of association and
theoretical arguments are both brought forward in the evaluations.
Although IARC evaluations are primarily concerned with human
carcinogenesis, they also generally consider the evidence for carcinogenicity
in animals as well. In some cases, the
evidential categories for human and animal carcinogenesis are distinguished. In particular, a category 2A designation
indicates that there is limited evidence for human carcinogenicity, while 2B
indicates that there is limited evidence of carcinogenicity in animals only.
The IARC weight of the evidence evaluations are implicitly
geared towards a two-node probability tree.
Either the chemical is carcinogenic, or it is not. It is also clear that, even
though it is not numerically defined, the probability is graded; the
probability that a chemical is carcinogenic can be very low or very high.
References
Hill, Sir Arthur Bradford (1965). The Environment and Disease: Association or
Causation? Proc Royal Soc Med 58:295-300.
Official Post Soundtrack
Post Notes
This is thesis #3. Mostly standard fare. The two twists are a) grouping the Hill criteria into emprical and theoretical categories, and b) suggesting the use of weight of the evidence evaluations for assigning weight to probability tree nodes.
No comments:
Post a Comment