Tuesday, March 31, 2015

Measuring Associations

Two Purposes

As a result of the causality problem, there are two quantitative issues at stake in just about any epidemiology study.   For a study concerned with the effects of environmental chemicals, the first issue is the toxicological one: What is the magnitude of the effect of the chemical on a health outcome.  As with toxicology, effects are often characterized by either the frequency of occurrence of a disease in a population, or the magnitude of an effect in an individual.   In addition, the need to demonstrate causality provides the impetus for measures of the strength of association, where the basic idea is compare the magnitude of an apparent effect relative to normal variation.  The higher the relative effects is, the less likely it is that the apparent effect is attributable to other causes.  Yet modern epidemiology, especially environmental epidemiology, often meets one purpose or the other, but not both.  The heavy emphasis on statistical significance testing is at least partly to blame. 

Disease Incidence Studies

Measures of association for disease incidence are well established.  There are two; relative risk and the odds ratio.  Relative risks are generally used for prospective and ecological studies where disease rates in populations grouped by are compared, usuall using the group with the lowest exposure as a referent group.  Odds ratios are used for cohort studies where exposures of individuals who have a disease are compared to those who do not.  The only problem is that many authors and/or readers do not understand what they are really for.  It is often thought that a relative measure is an effect measure, and therefore, a statistical significance that shows that non-overlapping confidence intervals can be taken as proof of causality.  No, it can’t.  At best, it means there is an association in need of explanation.  Taking relative risk as a measure of effect can also spill over in to the realm of theoretical interpretation, where it may be presumed that the effect of a chemical on a disease rate is necessarily proportional to the background rate.  That is especially likely to happen if you let a statistician pick the model.

If relative risk isn’t a measure of effect, what is?  For once, the answer to that question is simple; absolute risk.  When relative risk is used as the measure of association, calculating the apparent absolute effect may be obtained by undividing each group by the referent group incidence rate.   If the referent group rate is high, the apparent absolute effect may be high even if the relative risk is low.   Conversely, if the referent group rate is low, the apparent absolute effect may be low even if the relative risk is high.  Since the background rate may be unknown, getting a measure of the absolute risk from an odds ration isn’t always possible.

Studies With Continuous Individual Measures

There is a measure of association for continuous measures, but it is never used for that purpose.  It is called a Z-score, where the magnitude of an individual measure is divided by the standard deviation of the population.  Where z-scores are commonly used is in standardized testing for behavioral and scholastic performance.  For example, even though it was originally defined differently, normal variation in IQ is defined to be 15 IQ points, so an increment of 1 IQ point corresponds to a z-score of 1/15= 0.067.  A Scholastic Aptitude Test (SAT) has a standard deviation of 100, so a 1 point score difference corresponds to a z-score of 0.01.

As an example, after an accidental poisoning epidemic in Iraq (WHO, 1999), delays in the onset of developmental milestones (walking and talking) were associated with the exposure of mothers to methylmercury (i.e maternal hair level).  At low levels of exposure, most children in Iraq with lower levels were able to walk and sat three words by the time they are 18 years of age.  At the highest doses average age of milestone development was in excess of 24 months.  Using a typical standard deviation for milestone development of about two years (WHO, 2011), Z-scores in excess of 3 [(24-18)/2]  were observed in Iraq.  Z-scores of that magnitude are very unlikely to be attributable to variable contributions from other causal influences.  Most neurobehavioral epidemiology studies report apparent effects on IQ that are five points or less (i.e. z-score increments of <0.33), so that potential explanation by other causal influences is far more likely.

Neurobehavioral epidemiology studies are odd because they actually use standardized test scores as measures of effect all the time.  Since virtually any measure can be standardized, people can and do argue over whether or not any given score is really measuring anything important (Gould, 1996).  Since other fields use measures that are already established as indicators of an adverse health outcome, ascertaining the effect measure is not an issue.  However, if you to want to convert the estimated effect to a z-score to figure out what the strength of association is, you are going to have to figure what the standard deviation is.  That is not always possible.

So, most studies concerned with continuous measures don’t use z-scores to measure strength of association. They use statistical significance testing instead.   Since significance  tests also have a signal-to-noise aspect to them, that isn’t completely horrible.  What is horrible is the fact that there is little or no rhyme nor reason as to how significance tests are applied.  Some studies do regression analyses, while others compare differences between tertiles, quartiles, or quintiles without any trend analysis at all, or whatever it seems.  When using statistical significance as the only test, the bar for demonstrating a causal association is imperceptibly low.

Adjusted Associations

Another common problem with both quantal and continuous endpoints is that strength of association is often judged using adjusted values.  If you are trying to estimate the magnitude of a causal effect, then adjusting for other influences is a good idea.  If you are trying to measure the strength of an association it isn’t; if showing a significant relative risk requires adjusting for other factors, then the obvious conclusion is that the association CAN be explained by variation in other causal influences.


References

Gould, SJ (1996).  The Mismeasure of Man, 2nd Ed. WW Norton. 

World Health Organization. (1990).  International Programme on Chemical Safety, Environmental Health Criteria 101: Methylmercury.  World Health Organization, Geneva.  At http://www.inchem.org/documents/ehc/ehc/ehc101.htm.  

World Health Organization. (2006).  WHO Motor Development Study:  Windows of achievement for six gross motor development milestones.  WHO Multicentre Growth Reference Study Group.  Acta Paediatrica, Suppl 450, 86-95.

Official Post Soundtrack

10 cc (1977).  Good Morning Judge.  In: Deceptive Bends, Track 1.

Post Note

Thesis Post #24.  Another one for the epidemiology thread.


Sunday, March 29, 2015

Protection

The Science-Policy Shell Game

The game is played like this:
  • Before a technical audience, a proposition is presented as a matter of public policy.
  • Before the public, the same proposition is presented as a scientific fact.
Every shell game needs a ball.  In this game, the ball is uncertainty.   

Since it relies on having distinct technical and political audiences, the form of sophistry underlying the science-policy shell game could not have flourished to the extent that is has prior to the last century.  However, tacking on a moral imperative to a mathematical or factual argument is not new; that was a staple of the 16th century doctrine of probabilsm, and Hume's is to ought problem is an 18th century example.  But, frontloading a scientific argument with a moral premise is a more modern  trick.  As a prime example, consider the EPA Reference Dose.


Ought To Is: The Acceptable Daily Intake Becomes the Reference Dose

The safety assessment procedure developed in the 1950’s for the regulation of food additives and pesticides by both the US Food and Drug Administration and the World Health Organization began innocently enough.  The process was quite simple; the Acceptable Daily Intake (ADI) was calculated by applying a safety of 100 to an exposure (or dose) that resulted in no measurable or observable effect in an experiment with laboratory animals. 
In its initial formulation (Lehman and Fitzhugh, 1954), the ADI was clearly articulated as an instrument of regulatory policy, where a moral argument was butressed with some scietific facts:
The “100-fold margin of safety” is a good target but not an absolute yardstick as a measure of safety.  There are no scientific or mathematical means by which we can arrive at an absolute value.  However, this factor of 100 appears to be high enough to reduce the hazard of food additives to a minimum and at the same time allow the use of some chemicals which are necessary in food production or processing.
But, over the next 20 years, the concept of “no observed effect” came to be thought of as an “observed no effect” or a threshold dose at which no effect occurs.  Whether it was intentional or not, this gave birth to the shell game.  It was probably semi-intentional.  Even though it is surely a fairy tale, it would be nice to think that science actually could identify a level that has absolutely no effect.  The concept of a threshold also gave the ADI the semblance of a scientific fact, which happily absolved the regulatory toxicologist from the appearance of passing a moral judgment of what will be considered acceptable by the government (Wagner, 1995). 

When the pesticide program left the FDA in 1971 to join the EPA, the ADI went with it.  There, it came to be used for programs other than pesticide regulation.    Even though the interplay between scientific jargon and political rhetoric began before the EPA was ever formed, at the U.S. EPA in the 1980’s the shell game was deliberately institutionalized. Under pressure from agency management to adopt the guidelines set forth for carcinogen risk assessment (NRC, 1983), a committee of agency toxicologists at the EPA responded with scientific jargon (Barnes and Dourson, 1988): 
In practice, the ADI is viewed by many (including risk managers) as an "acceptable" level of exposure, and, by inference, any exposure greater than the ADI is seen as "unacceptable." This strict demarcation between what is "acceptable" and what is "unacceptable" is contrary to the views of most toxicologists, who typically interpret the ADI as a relatively crude estimate of a level of chronic exposure which is not likely to result in adverse effects to humans.
The “many (including risk managers)” refers to the traditional (i.e. Lehman and Fitzhugh) view of the USFDA.  On the other hand, the “most toxicologists” refers to the members the EPA committee, who then proceeded reinforce their claim of scientific factuality by replacing the terminology associated with the ADI:

  • The ADI became the Reference Dose (RfD)
  • Safety assessment became risk assessment
  • Safety factors became uncertainty factors
The primary reason for this change in terminology is that, contrary to the intent of the 1983 NRC "Redbook" report, the agency toxicologists wanted to keep the decision making process out of the hands of agency decision makers.  I know this because they told me.  I’m sure they thought that I would agree that it was a good thing to do.  However, since I figured public disinformation would lead to a less well informed government, I didn’t think lab cloaking regulatory policy was a very good idea.  That has certainly proven to be true.  However, I didn’t see the other consequence: Political rhetoric disguised as science will choke out actual scientific dialog.  Now, I understand that really well.

The Protection Racket

Institutionalization of the science-policy shell game is not unlike founding a religion.  Soon after I joined the FDA a little over 25 years ago, I had a conversation the Director of the Office of Toxicology about the ADI.  I expressed my concern over the ability of toxicology to identify a threshold where no effect occurs.  Not realizing that I was attempting to turn to clock back 35 years, I also suggested that perhaps that the ADI should be considered as a “practical threshold” where even if there is a risk, the agency will consider it to be negligible.  The response, delivered as a statement of faith: “I believe in practical thresholds”. 

A generation after the founding of RfDeontology, many followers truly believe that the RfD separates heaven from hell.  An industry consultant or right-leaning agency toxicologist like my former office director can use the ADI or the RfD to argue that everything is just fine; we are safe.  But, those on the left have also discovered that the RfD can also be used to argue that a particular exposure is unsafe without ever identifying what the risk actually is.  In fact, the RfD has come to be synonymous with “protection”.  If your exposure is below the RfD, then you are Environmentally Protected.  If it isn’t, then you aren’t.  Do what we say, eat what we tell you eat, and you are protected.  Otherwise, you are on your own
.
The RfD is so simple anyone can misunderstand it.  Most of the new generation of Protectors and Protectees are not toxicologists or even scientists.  Not only do they believe that, as EPA claims, that the threshold concept is necessary for the application of RfD methodology, they may also believe that the RfD itself is the threshold, even though the written gospel says otherwise (Barnes and Dourson, 1988):
 In general, the RfD is an estimate (with uncertainty spanning perhaps an order of magnitude) of a daily exposure to the human population (including sensitive subgroups) that is likely to be without an appreciable risk of deleterious effects during a lifetime.

The RfD is useful as a reference point from which to gauge the potential effects of the chemical at other doses. Usually, doses less than the RfD are not likely to be associated with adverse health risks, and are therefore less likely to be of regulatory concern. As the frequency and/or magnitude of the exposures exceeding the RfD increase, the probability of adverse effects in a human population increases. However, it should not be categorically concluded that all doses below the RfD are "acceptable" (or will be risk-free) and that all doses in excess of the RfD are "unacceptable" (or will result in adverse effects).
Really, the confusion is quite understandable.  When an agency uses the RfD as the basis for a regulation, and they do, “below the RfD” does indeed mean acceptable to the agency, while “above the RfD” means unacceptable to the agency.  Ought To Is To Ought: The shell game can drive a country crazy.

References

Barnes DG and Dourson ML (1988).  Reference Dose (RfD): Description and Use in Health Risk Assessments.  Regul Pharmacol Toxicol 8:471-486.  Also at http://www.epa.gov/IRIS/rfd.htm

Lehman AJ and Fitzhugh OG (1954).  100-Fold Margin of Safety.  Quarterly Bulletin of the Association of Food and Drug Officials 18:33-35.

National Research Council (1983).  Risk Assessment in the Federal Government. National Academy Press, Washington, DC. 

Wagner WE (1995). The science charade in toxic risk regulation, Columbia Law Review 95:1613-1720.

Official Post Soundtrack

Williams, Lucinda (2014).  Protection.  In: Down Where the Spirit Meets the Bone.  Disc 1, Track 2.

Post Note

Thesis Post #23.  This is the first post in the Shell Game thread, which is where the promised "political commentary" will be found.  

The soundtrack is especially noteworthy.  I have written many versions of the shell game story over the last 20 years, but this one I wrote with "Protection" playing in the background. I thought about putting the song in the reference list, but instead decided on making it the sound track instead, which put the idea in my head that all the posts should have soundtracks.  

Saturday, March 28, 2015

Safety and Adjustment Factors

Double Duty Numbers

Ever since the concept was formally introduced to food safety in 1954, safety factors (or Uncertainty Factors as the EPA calls them) have played a dual role.  One has a scientific basis, the other does not.  The scientific role is to correct for known inaccuracies.  For example, began to answer the question “Why a Factor of Safety?, Lehman and Fitzhugh (1954) started with: “Animals for the most part are more resistant to toxic chemicals than man”.  The policy role is to add an element of precaution.  This notion is also given by Lehman and Fitzhugh in a concluding argument: “The application of simple statistical rules indicates that the probability of human injury decreases with each increase in the margin of safety”.  These two roles are separable, but in the Safety Assessment paradigm they usually are not.

However, the rationale for the original 100 fold safety factor was soon segregated into two factors of 10.  One safety factor  of 10 is to be applied when an ADI is based on studies with laboratory animals.  If an ADI or TDI is based on human data, as is often the case for contaminants, then the animal factor can be dispensed with.  The other factor of is intended to account for human variability.  Since naturally occurring chemicals that are judged under a standard where susceptible subpopulation don’t count, the difference between “ordinarily injurious” and “may be injurious” is often interpreted as a factor of 10. 

The EPA often uses additional Uncertainty Factors as well (Barnes and Dourson, 1988).  The two most common are an additional factor for generating a standard for chronic (long-term) exposure (i.e. an RfD) based on short-term data, and an additional factor when there are database deficiencies.  The latter factor is justified as a means of encouraging companies to provide better data; this justification clearly does not apply for chemical contaminants that have no sponsor.

Safety Factors are obviously somewhat arbitrary.  However, Dourson and Stara (1983) have argued that they are not.  Their argument was based on the distribution of empirical ratios for each factor (e.g. animal to human; long-term to short term) and nothing that a factor 10 generally was close a value that would be exceed less than 5% of the time.  But really, they just replaced one arbitrary number with another; the 5th percentile is the 50th percentile divided by 10.

A strategy that has been suggested (e.g. Barnes and Dourson, 1988; WHO, 2009) for adapting the Safety Assessment paradigm to non-premarket approval applications has been to dispense with the safety factors.  Known as the “Margin of Exposure” approach, the idea is to compare a NOAEL or BMD to an estimate of exposure, and to employ the resulting ratio as a measure of the hazard.  Although dispensing with the precaution may be a good idea, failing to adjust for known differences (e.g. between animals and humans) is not.

Adjustment Factors and the Human Equivalent Dose

Compared to a quantitative risk assessment, the main attraction of the safety assessment paradigm is that it both simple and transparent.  However, that doesn’t mean that safety factors can’t be subdivided into adjustment factors and precautionary factors.  An EPA methodology for replacing the animal-to-human uncertainty factor does exactly that.  Instead of applying a standard safety factor, a Human Equivalent Dose is estimated using a ¾ power body weight scaling factor that results in a species-specific adjustment to the traditional presumption that the dose is directly proportional to body weight (EPA, 2011).  As a result a larger factor is used to scale doses from smaller animals than larger animals.  For example, a typical scaling factor is 7.2 for a mouse study, 4.1 for a rat study, and 1.6 for a dog study will result.  As a precautionary measure, an additional factor of 3 is recommended as well.  This means that instead of a traditional safety factor of 10 for all species, an overall factor of about 22 for mice, 12 for rats, and 5 for dogs.

Turning Factors Into Distributions

Since replacing the NOAEL with a BMD is now widely accepted, a considerable amount of effort has been expended make the safety assessment process more like risk assessment by replacing safety factors with distributions.  This results in “harmonized paradigm” that lacks the simplicity of the safety assessment paradigm, but still presumes that the eventual goal of the analysis is to identify a safe or acceptable level of exposure.  There also has been an effort to turn safety factors into distributions instead of factors.  WHO (2014) gives a recent summary of this work.  In particular, uncertainty distributions for factors used to adjust for exposure duration, human equivalent doses, route of exposure, and human variability (a two dimensional distribution that includes both a population frequency dimension and an uncertainty dimension).  Most of these distributions could also be used for a true risk assessment, where the goal is to provide risk estimates, instead of just setting a level.

References

Barnes DG and Dourson ML (1988).  Reference Dose (RfD): Description and Use in Health Risk Assessments.  Regul Pharmacol Toxicol 8:471-486.  Also at http://www.epa.gov/IRIS/rfd.htm
Dourson, M.L. and J.F. Stara (1983). Regulatory Toxicology and Pharmacology 3: 224-238
Lehman AJ and Fitzhugh OG (1954).  100-Fold Margin of Safety.  Quarterly Bulletin of the Association of Food and Drug Officials 18:33-35.
World Health Organization (2010).  Principles and methods for the risk assessment of chemicals in food. Environmental Health Criteria 240.

World Health Organization (2014).   International Programme on Chemical Safety, Harmonization Project Document 11.  Guidance Document on Evaluating And Expressing Uncertainty in Hazard Characterization.  In particular, see Chapter 4.

Friday, March 27, 2015

Where Is the Data?

The Recorded Observation

In a very basic sense, the practice of science involves arguing what is likely to happen in the future given what has happened in the past.  Based on their own personal past experience, everyone does this.  What makes a scientific discipline special is that there is a shared history of observation that allows and demands consideration of non-personal experience.  Therefore, most scientific investigations begin with the generation of record of observation, which is commonly referred to as “data”.  Since the entire discipline in some way depends on having accurate records, the credibility of a scientist heavily depends on correctly recording an observed event.  When a question of causality is involved, correctly describing the events that preceded the observation is very important as well.  In laboratory experiments and clinical trials, preceding events are deliberately manipulated in order to observe what follows.  In epidemiology, the order of events are simply observed without controlling them.

Sharing Data

Obviously, there is more to science than simply generating data.  Drawing conclusions from the data that allow predictions to be made are what give science its power.   But, before launching into an analysis, most scientific papers start by showing the data that is being analyzed in either tabular or graphical form.  It is also quite common to compare the results of the analysis to data by showing both observed and predicted values in the same table or graph.  There are two reasons for doing this.  First, it allows the reader to make their own judgment about the quality of the analysis being presented.  Second, it may permit the data to be used by other authors as the basis for further analysis, possibly by combining observations from multiple experiments or studies.

Once a paper has been published, most laboratory scientists will share raw data if the descriptions provided in the paper do not provide sufficient detail.  Epidemiologists often will not do this.  Reasons frequently given for this are a) the data are in some way confidential, or b) the data are the proprietary property of the investigator.  The other obvious explanation is that very different conclusions could be drawn from the data, and therefore the data are withheld in order to protect questionable analysis.  Not everyone is happy with this.  In fact, a federal law was enacted that requires investigators to share data when studies are funded by the federal government.  But still, you may have to go court to get it.  That’s no fun.

Adjusted Data

However, many epidemiology do at least show summary results of raw data, which is good.  It is also quite common to show “adjusted data”, which are estimated values that are intended to represent what would be observed without the presence of other causal influences.  While this isn’t necessarily a bad idea, it is important to note that adjusted values are not really recorded observations.  Their validity depends on the ability of the quantitative models used to make the adjustments to do so correctly.  Since most quantitative models in biology are approximately correct at best, it is a pretty sure bet that the adjusted values aren’t exactly right.    If the other causal influences are much more important than an environmental influence, then even a relatively minor flaw in the model used to make the correction can result in a large maladjustment of the variable of interest.  Maladjustments are also likely when there are many variables being adjusted for, and there are potential interactions between one or more of them. 

Inappropriate adjustments can either hide true effects or create the appearance of effects that aren’t really there.  A statistically significant result may arise because one or more of the models used to make the adjustments is “systematically” wrong.   While there is no sure fire way to prevent any of these results from happening, comparing unadjusted to adjusted data is advisable; if the two are very different or deviate with a quantitative trend, then there is cause for concern.  If only adjusted values are shown, then perhaps even more caution is advised.

Meta-analysis and Risk of Bias Bias

Given the fact that causal determinations are difficult when working with observational epidemiological data, the weakness of individual studies can be often be overcome by combining them.  If raw data is available, which is rare, the data can be pooled and analyzed as if it all came from a single study, perhaps with additional variables in the model to account for differences between study populations.  That not only allow better characterization of the dose-response relationship of the variables of interest (e.g. arsenic and lung cancer), it can also promote the development better models for the other causal influences as well (e.g. smoking). 

If actual data are no available, then the only alternative is to try to assimilate the results of published study results.  A Risk of Bias analysis is a formal weight-of-the-evidence evaluation that is limited to evaluating the extent to which a particular study adequately accounts for and reflects all potential causal influences (Stoup et al, 2000).  As a search for plausible explanations, this effort dovetails with weight-of-the-evidence (“Hill Criteria”) evaluations concerned with specific causal theories.  But here’s the thing: A regimented evaluation process may eliminate known sources of bias, but it cannot eliminate unknown sources.  In fact, it may unwittingly reinforce them.  A literature survey is an opportune time for some novel inductive synthesis; and a new theory may turn an old theory into a source of bias.  Inductive reasoning never stops, or at least it shouldn’t.  In the long run, the only cure for bias is sharing the recorded observations.

Reference

Stoup et al (2000).  Meta-analysis of Observational Studies in Epidemiology. A Proposal for Reporting.  JAMA, April 19, 2000—Vol 283, No. 15

Official Post Soundtrack

XTC (1982).  Senses Working Overtime.  In: English Settlement, Track 3.

Post Note

Thesis Post #21.  Fourth in the epidemiology series; follows Toxicology meets Epidemiology.  Also, the non thesis post Data Economics is also in the same vein.  The youtube video has Drums and Wires as the cover, but it's not on that album.  Oh well.

Thursday, March 26, 2015

An Ode to a Tree

Many Valued Logic

In a short four page essay, Rescher (1993) discussed the use of probability logic as a form of multi-valued logic, where propositions can have truth-values other than True or False.  A few key quotes will serve to distill the basic notion of a probability tree still further:

The leading idea of this probabilistic approach to many-valued logic is that of assigning likelihood values or probabilities to statements.  A measure function Pr is presupposed as given, which assigns some real number value Pr(p) to each and every member p of the domain of statements at issue. 

For present purposes, let us take “statements” to mean “theories”.

Specifically, the following basic conditions are postulated:
(P1) 0 ≤ Pr(p), for any statement p(P2) Pr(p ∨ ¬ p = 1(P3) Pr(p ∨ q) = Pr(p) + Pr(q), provided that p and q are mutually exclusive

So, what does that mean?

(P1) All the alternative theories need to at least be plausible
(P2) The probability of all alternative theories sums to 1
(P3) The alternative theories need to be mutually exclusive, if one is true then the others are not.

For present purposes, however, no particular, specific method for the assignment of probabilities need be assumed.

What he said; any convincing argument will do.

Many-Handed Science

The better part of an editorial that appeared in Science 40 years ago (David, 1975):

A few months ago, Senator Muskie called for "one-armed" scientists. The occasion was a Senate hearing on the health effects of pollutants. Testimony from the National Academy of Sciences and other sources was not as definitive as the Senator desired. Witnesses insisted upon saying, "On one hand, the evidence is so, but on the other hand...." Thus, the call for one-armed scientists.

This incident illustrates a fundamental dilemma of the scientist or engineer in communicating with his patron, the lay person. Laymen conceive of scientific fact as an absolute shades of gray and uncertainty are not acceptable. Scientific investigations are to produce unequivocal answers, according to the popular notion. On the other hand, scientists know that there are very few absolutes that will stand up for long. Those few that do are enshrined as "laws of nature."
The modest influence of science in affairs today rests largely on its reputation for objectivity. To the degree that we abandon that virtue, we lose influence and are considered merely another self-serving, politically biased, ax-grinding constituency. We hear already the sinister thoughts from politicians that a reputable scientist can be found to support any side of any controversy, that scientists have used their disciplines to reinforce their political convictions, and that scientists are interested primarily in feathering their own nests.

There are obvious reasons why we as scientists should not abet this tendency by lending our credibility to any side without stating all the caveats that go with a responsible scientific report to our own community. It is indeed difficult to qualify properly theoretical results and their speculative implications, tentative conclusions from limited data, and social impacts without creating more uncertainty than previously existed and thereby weakening the basis for action. Yet that is the responsible path. We should be encouraging greater respect for the mystique of the scientific process and its role in uncovering reality. We should be emphasizing the complexity of important matters, their unknowability, and yet their promise for the future.

A probability tree is not inductive logic or inductive reasoning.  But, it will serve as a useful inductive scoreboard for those theories that are competing for use.  Furthermore, when theories are favored in order to reinforce political convictions or feather nests, objectification of the competition will even the playing field.  Yeah, it happens; quite often in fact.  When research budgets start getting scrutinized in a declining economy, political justification can easily trump scientific justification when the peer reviewers foxes are guarding the entrance.  Keeping score can prevent that.  Judging theories is a little like judging gymnastics or figure skating; beauty is in the eye of the beholder.  But still, if a theory keeps winning even after it keeps falling flat, an open competition will show that the judges have been bought.

The probability trees employed throughout this blogthing are concerned with food risk.  The larger current issue that cries out for a tree treatment is global climate change.  It is abundantly clear by now that there is no reliable climate change theory, but there are multiple plausible climate change theories – and they can’t all be right (Morgan, 2015).   If you want to separate the science from the policy, grow a tree.

Tree Planting

Although not widely used in public health, probability trees are a recognized tool for formal decision analysis.    In fact, you can buy an add-in for Excel.  But, to integrate the probability of causes into a larger model that has statistical uncertainties as well, it is better to roll your own.  It is not hard.  Here are two ways to do it with Excel Visual Basic; the same logic can be used in just about any programming language:


Note: The two node if-then-else tree is shown using both a worksheet function and with VBA.  A five node tree using the select case statement is VBA only.

References

David, EE (1975).  One Armed Scientists?  Science 176: 679

Morgan, MG (2015).  Commentary: Our Knowledge of the World is Often Not Simple: Policymakers Should Not Duck That Fact, But Should Deal with It.  Risk Anal 35:19-20.

Rescher N (1993).  27 Probability Logic: A Non-truth-functional System.  In: Many-valued Logic, Gregg Revivals, Aldershot pp. 184-188.

Official Post Soundtrack

Fleetwood Mac (1972).  Bare Trees.  In: Bare Trees, Track 5.

Post Notes

Thesis Post #20.  This is a third branch off of Dictionary of Probability, arguing for probability trees as an alternative to Bayesian approaches to theoretical uncertainty.  A discussion of using WoE to assign probabilties is an anticipated follow up. Yes, more quotes than original prose.  This is also my first attempt to link to some sample code; if it doesn't work, send me a note.  I expect to do more of that later.

Wednesday, March 25, 2015

Population Dose-Response Models

Individual Dose-Response Models

In toxicology, there are two general classes of dose-response models.  Benchmark dose modeling software calls one set “continuous”, while the other set is “quantal”.  This mathematical categorization doesn’t convey the essential difference: While continuous models represent what happens in an individual, a quantal model represents the frequency of occurrence of an effect in a population (WHO, 2014).  Since pharmacology is closed associated with the practice of Medicine, the continuous “target theory” models discussed previously are what pharmacologists are generally interested in; the magnitude of an effect in an individual as it relates to dose.  Toxicologists use the same models for the same purpose.

Frequency of Occurrence Models

While some branches of toxicology are also primarily concerned with individuals (e.g. clinical toxicology), the better part of toxicology is concerned with public health where the “client” is a population rather than an individual.  In addition, the data used to characterize a dose-response relationship often come from a set of observations from a population of either laboratory animals or people, so the most straightforward use of the data is to characterize the dose-response relationship in a population.  The models that are used for this purpose come from two different traditions:
  • Cancer-Risk Assessment.  These biophysical models are largely based on radiation theory, where it is theorized that biological variability in the outcome is entirely attributable to quantum variation in the interaction between a genotoxic carcinogen and a DNA target.  Given what is known about the complexity and variability of physiology in general and the etiology of cancer in particular, the theory is quite silly.  The allusion to quantum physics also brought the Copenhagen Interpetation to cancer risk assessment, which is also silly.   But, as there are also other potential justifications for it (Crawford and Wilson, 1996), at the heart of these models is a simple exponential equation that isn’t silly at all.
  • Probit Analysis.  The other approach employs the concept of a threshold, at least in one sense of the word.  In probit analysis, it is theorized that for any given subject (animal or human), there is a specific dose at which any given effect occurs, but the threshold dose varies among the members of a population (Finney, 1954).   The most widely used distributions that used for descriptive purposes are the normal and lognormal distributions.  Probit analysis is often used to estimate the dose at which a specific percentage of subjects that will respond.  For example, an LD50 is the dose that is lethal to 50% of the population, while an ED10 (aka a BMD10) is the dose at which 10% of the population will exhibit a particular response.  Since essentially any statistical distribution may be used as a descriptive tool, variations of the concept behind probit analysis are virtually infinite.  For example, if there are large differences in specific subpopulations, a bimodal or multimodal distribution may provide a better prediction.  Using a distribution that is skewed towards the low end, such as a Weibull distribution, will produce a population dose-response curve that is in the same family if models as those used in cancer risk assessment.  

So, in summary, here is a graphical synopsis of a few potential theoretical impacts of a chemical agent on a population:



2D "Hybrid" Models

Population data is also often used to characterize individual dose response relationships, with the more or less inevitable result that the dose-response relationship will characterize what will happen in an “average” person.  Since average people don’t really exist at all, it is a quite reasonable to wonder about what the range of variation in a continuous response will be in a population.  The difficulty with answering that question is intimately tied to the question of causality: Variation in effect of the substance of interest diminishes the ability to distinguish it from other causes.  While this is often even true even in a laboratory animal experiment, it is an even bigger problem when trying to interpret epidemiological data.  However, when there are obvious population differences, they can be accounted for; when the effect gets large relative to other causal influences, characterizing variability becomes possible. 

Nonetheless, hybrid dose-response models can be developed from animal data and sometimes from epidemiological data as well.  The simplest way to do this is to combine a descriptive statistical model with a continuous individual dose-response model, which is the approach generally taken for the deriving benchmark dose software (Crump, 1995).   Using a hybrid model to derive a BMD requires a two-dimensional definition of the benchmark response; one for the continuous effect and another for the population frequency dimension.  There are far more complicated methods as well.  Population pharmacokinetic models have also been developed for some drugs that require keeping track of correlations between different pharmacokinetic parameters. 

Using Population Models to Make Statements About Individuals

When population data is being used to identify a dose that has no measurable effect (i.e. a NOAEL or a BMD) then the distinction between a continuous and quantal endpoint doesn’t matter so much: Either way, you can more-or-less reasonably claim that at much lower levels of exposure nothing much is going to happen.  But if you are using the same analysis to actually predict effects (i.e. a real risk assessment instead of a safety assessment), the distinction is much more important.  A continuous effect model is very easy to translate into a statement about individual effect, with or without some acknowledgement that your mileage may vary, since that is essentially what it is about in the first place. 

But a quantal population model is a whole ‘nother story, and it doesn’t really matter whether it concerns cancer or some other endpoint.  Even if the population model is a stone cold statistical fact, when someone asks “What will happen to me?” something dreadful happens: Population variability suddenly becomes uncertainty, and a person becomes a chance.   I blame Copenhagen for that.

References

Crawford, M. and R. Wilson. (1996). Low dose linearity: The rule or the exception? Human and Ecological Risk Assessment 2: 305-330.

Crump K. (1995).  Calculation of benchmark doses from continuous data. Risk Anal. 15:79–89.
Finney DJ (1954).  Probit Analysis. Cambridge University Press, Cambridge.
World Health Organization (2014). International Programme on Chemical Safety, Harmonization Project Document 11.; Guidance Document on Evaluating And Expressing Uncertainty in Hazard Characterization  See chapter 3 for discussion of continuous vs quantal models.

Official Post Soundtrack

Cure, The (1980).  A Forest.  In: Seventeen Seconds, Track 7

Post Notes

Thesis Post #19.  Goes in dose-response modeling thread; mostly background material for non-toxicologists.  



Tuesday, March 24, 2015

Target Theory

Biochemical Pharmacology and Toxicology

In both pharmacology and toxicology, it is generally presumed that biological effects (which may be desirable or undesirable) occurs when a chemical interacts with a cellular target site.  Although there are a number of different ways that molecule may interact with a target, in a test tube, the mathematical description of the relationship between concentration and effect can usually be described with a relatively small set of alternative possibilities:
  • Mass Action.  If the interaction is ionic and reversible, the ligand receptor interaction can be described with the same dissociation kinetic equation that is found in any introductory chemistry text.  That basic equation has been modified for biochemical applications with an additional parameter that relates chemical associations to enzyme activity .   The same equation is used in pharmacology, but what is called an enzyme substrate in biochemistry is called a ligand instead.
  • Massive Action. There are some biochemical effects that are mediated by cooperative binding, which that multiple molecules are required to produce an effect means that there is an S-shaped (sigmoidal) curve with very little effect at low concentrations. With yet another additional parameter, the law of mass action can be adapted to describe this using the Hill equation.
  • Irreversible Binding.  Although it is not common in pharmacology, another form of molecular interaction that underlies many toxicological effects is covalent, irreversible binding.  The mathematics for that interaction can also be found in a chemistry text book under the heading of first order kinetics, which results in an exponential relationship between concentration, time, and effect.
  • Nonspecific target sites. There are also many generic toxicological effects, like oxidative damage, that really don’t have a specific target site.  Many of these effects occur from natural causes (e.g. oxygen), so any given molecule may add to damage to proteins or DNA that occurs without it.   Linear models that assume that the additional damage is directly proportional to concentration are often used for this.

So, without delving into either the mathematics or other many minor variations, here is a quick graphical view of the range of potential biochemical relationships between concentration and biochemical effect:


  1. There is a wide variation in the relationship between what happens at high concentrations and what happens at low concentrations.  
  2. In spite of all that variation, all the curves are all monotonic; as the concentration decreases, the effect decreases as well.   
  3. All of these functions are approximately linear at low concentrations, but the difference in the linear slope is immense.

Dose-Response Relationships in Pharmacology and Toxicology

Biology is far more complicated than biochemistry.  That complexity is often divided into two categories:
  • Pharmacokinetics.  What happens after the drug or chemical is ingested, but before it gets to the target.
  • Pharmacodynamics.  What happens after the drug or chemical gets there.

Suffice it say, the relationship between dose and response may bear little relationship to what happens in a test tube.  In particular, homeostatic mechanisms make real dose-response relationships look more sigmoidal than would be expected from target theory alone.   While short-term toxicological effects are often like pharmacological effects, only with a higher dose, effects that take place over a long period of time often result from non-specific targets where receptor-ligand theory really doesn’t apply.

However, since the target interaction is at least part of the equation, giving at least some consideration  to the biochemical interaction at the target is worthwhile:
  1. If the data are available to support it, a complex biological model can be used to characterize the dose response relationship.  While this is much more common in pharmacology than toxicology, models have been developed for especially important toxic agents like formaldehyde and anticholinergic pesticides.  
  2. Knowing something about the biochemical interaction may affect the interpretation of what happens physiologically.   
  3. Even if the target site is unidentified and the biochemical mechanism is unknown, the list of potential possible interactions serves to limit the range of interpretation that is plausible.  For example, a linear dose-response model is usually at least plausible, whereas a supralinear function where the effects get bigger as the dose gets smaller is not.  
  4. It is generally not reasonable from a biochemical standpoint to suppose that pharmacological or toxicological effects are nonmonotonic.  If a dose-response relationship appears to be nonmonotonic, there are generally two good explanations:
  • The appearance isn’t real.  This is the explanation of first resort when interpreting epidemiological data.  There are many times when association really doesn’t mean causation, and this is probably one of them.
  • There is more than one biochemical effect.  This is in fact quite common; U-shaped “hormetic” dose response functions where a response goes down and then up can usually be explained this way.  Since two effects means two "causes", the inductive reasoning for the "up" part and the "down" part of a curve may be quite different by relying on different data for empirical support and different theory for theoretical support.

The Linear No Threshold Debate

Since biological variability can be eliminated in a test tube, biochemistry can measure very tiny effects.  Pharmacology and Toxicology must be content with measuring not-so-tiny effects.  With the exception of the ‘ordinarily render’ clause of 402(a)(1), Food Toxicology is inherently concerned with unmeasurable effects, so some speculation about what may happen at dose that have no consequence is unavoidable.  This speculation often results in a debate over whether or not there is a threshold of some sort, and if not, presuming that high to low dose extrapolation using a linear model is justified.  Biochemistry supports neither of these positions.  First, there is no biochemical mechanism for a threshold.  Second, even though it is reasonable to suppose that all dose-response functions are linear at low dose, it is not reasonable to presume linearity at all doses where the effect is unmeasurable. 

The alternative is this: Assume that the measurable effect ”high dose” portion of the curve has something to tell you.  If it looks linear at high doses, then a linear high-to-low dose extrapolation is usually very plausible.  If it is nonlinear at high doses, then using a plausible curve instead of a line is a mighty fine idea.  Sort of like Benchmark Dose modeling, except without the Benchmark Dose.  That wouldn’t be so crazy.

General Reference

Brunton, LL, Lazo JS, and Parker (2006).  Chapter 1: Pharmacokinetics and Pharmacodynamics: The Dynamics of Drug Absorption, Distribution, Action, and Elimination. Goodman & Gilman’s: The Pharmacological Basis of Therapeutics, 11th Edition.  McGraw-Hill, New York.

Or the first chapter in any other edition.

Official Post Soundtrack

Blondie (1978).  One Way or Another.  In: Parallel Lines, Track 2.


Post Notes

Post Thesis #18.  Part of dose-response modeling thread.