Tuesday, March 31, 2015

Measuring Associations

Two Purposes

As a result of the causality problem, there are two quantitative issues at stake in just about any epidemiology study.   For a study concerned with the effects of environmental chemicals, the first issue is the toxicological one: What is the magnitude of the effect of the chemical on a health outcome.  As with toxicology, effects are often characterized by either the frequency of occurrence of a disease in a population, or the magnitude of an effect in an individual.   In addition, the need to demonstrate causality provides the impetus for measures of the strength of association, where the basic idea is compare the magnitude of an apparent effect relative to normal variation.  The higher the relative effects is, the less likely it is that the apparent effect is attributable to other causes.  Yet modern epidemiology, especially environmental epidemiology, often meets one purpose or the other, but not both.  The heavy emphasis on statistical significance testing is at least partly to blame. 

Disease Incidence Studies

Measures of association for disease incidence are well established.  There are two; relative risk and the odds ratio.  Relative risks are generally used for prospective and ecological studies where disease rates in populations grouped by are compared, usuall using the group with the lowest exposure as a referent group.  Odds ratios are used for cohort studies where exposures of individuals who have a disease are compared to those who do not.  The only problem is that many authors and/or readers do not understand what they are really for.  It is often thought that a relative measure is an effect measure, and therefore, a statistical significance that shows that non-overlapping confidence intervals can be taken as proof of causality.  No, it can’t.  At best, it means there is an association in need of explanation.  Taking relative risk as a measure of effect can also spill over in to the realm of theoretical interpretation, where it may be presumed that the effect of a chemical on a disease rate is necessarily proportional to the background rate.  That is especially likely to happen if you let a statistician pick the model.

If relative risk isn’t a measure of effect, what is?  For once, the answer to that question is simple; absolute risk.  When relative risk is used as the measure of association, calculating the apparent absolute effect may be obtained by undividing each group by the referent group incidence rate.   If the referent group rate is high, the apparent absolute effect may be high even if the relative risk is low.   Conversely, if the referent group rate is low, the apparent absolute effect may be low even if the relative risk is high.  Since the background rate may be unknown, getting a measure of the absolute risk from an odds ration isn’t always possible.

Studies With Continuous Individual Measures

There is a measure of association for continuous measures, but it is never used for that purpose.  It is called a Z-score, where the magnitude of an individual measure is divided by the standard deviation of the population.  Where z-scores are commonly used is in standardized testing for behavioral and scholastic performance.  For example, even though it was originally defined differently, normal variation in IQ is defined to be 15 IQ points, so an increment of 1 IQ point corresponds to a z-score of 1/15= 0.067.  A Scholastic Aptitude Test (SAT) has a standard deviation of 100, so a 1 point score difference corresponds to a z-score of 0.01.

As an example, after an accidental poisoning epidemic in Iraq (WHO, 1999), delays in the onset of developmental milestones (walking and talking) were associated with the exposure of mothers to methylmercury (i.e maternal hair level).  At low levels of exposure, most children in Iraq with lower levels were able to walk and sat three words by the time they are 18 years of age.  At the highest doses average age of milestone development was in excess of 24 months.  Using a typical standard deviation for milestone development of about two years (WHO, 2011), Z-scores in excess of 3 [(24-18)/2]  were observed in Iraq.  Z-scores of that magnitude are very unlikely to be attributable to variable contributions from other causal influences.  Most neurobehavioral epidemiology studies report apparent effects on IQ that are five points or less (i.e. z-score increments of <0.33), so that potential explanation by other causal influences is far more likely.

Neurobehavioral epidemiology studies are odd because they actually use standardized test scores as measures of effect all the time.  Since virtually any measure can be standardized, people can and do argue over whether or not any given score is really measuring anything important (Gould, 1996).  Since other fields use measures that are already established as indicators of an adverse health outcome, ascertaining the effect measure is not an issue.  However, if you to want to convert the estimated effect to a z-score to figure out what the strength of association is, you are going to have to figure what the standard deviation is.  That is not always possible.

So, most studies concerned with continuous measures don’t use z-scores to measure strength of association. They use statistical significance testing instead.   Since significance  tests also have a signal-to-noise aspect to them, that isn’t completely horrible.  What is horrible is the fact that there is little or no rhyme nor reason as to how significance tests are applied.  Some studies do regression analyses, while others compare differences between tertiles, quartiles, or quintiles without any trend analysis at all, or whatever it seems.  When using statistical significance as the only test, the bar for demonstrating a causal association is imperceptibly low.

Adjusted Associations

Another common problem with both quantal and continuous endpoints is that strength of association is often judged using adjusted values.  If you are trying to estimate the magnitude of a causal effect, then adjusting for other influences is a good idea.  If you are trying to measure the strength of an association it isn’t; if showing a significant relative risk requires adjusting for other factors, then the obvious conclusion is that the association CAN be explained by variation in other causal influences.


References

Gould, SJ (1996).  The Mismeasure of Man, 2nd Ed. WW Norton. 

World Health Organization. (1990).  International Programme on Chemical Safety, Environmental Health Criteria 101: Methylmercury.  World Health Organization, Geneva.  At http://www.inchem.org/documents/ehc/ehc/ehc101.htm.  

World Health Organization. (2006).  WHO Motor Development Study:  Windows of achievement for six gross motor development milestones.  WHO Multicentre Growth Reference Study Group.  Acta Paediatrica, Suppl 450, 86-95.

Official Post Soundtrack

10 cc (1977).  Good Morning Judge.  In: Deceptive Bends, Track 1.

Post Note

Thesis Post #24.  Another one for the epidemiology thread.


No comments:

Post a Comment