Thursday, April 30, 2015

Burdens of Proof

Standards of Proof

Although there is always room for interpretation, there are three main standards of proof that various statutes can and do prescribe:

  • Possibly.  Since it is only necessary to show that there is some credible evidence that something might be true, this is a very weak standard. 
  • As Likely As Not.  Also known as preponderance of the evidence, this standard requires that that the evidence FOR outweigh that AGAINST by at least some slim margin.  This is the standard typically employed in civil cases.
  • Almost Certainly.  Criminal proceedings that require proof of guilt employ the beyond a reasonable doubt standard.

Food Law Interpretation

The premarket approval clauses of the Federal Food Drug and Cosmetic Act (sections 408 and 409) clearly operate with a standard of proof that corresponds to “Possibly”.  If there is some reasonable possibility of harm then an additive or pesticide is not safe.   The “reasonable certainty of no harm” provision of the Toxic Substances Control Act is of the same ilk.

There has always been an inclination to evaluate contaminants in the same way as additives.  For example, Hutt (1978) wrote:

Some have suggested that different rules should apply with respect to environmental contaminants (such as aflatoxin in peanuts) as contrasted with substances used in the production of food (for example packaging materials) or direct food ingredients (such as saccharin).  There is, however, no conceptual or rational basis for this distinction.  An environmental contaminant can be reduced or eliminated from the food supply just as easily as an indirect constituent or a direct ingredient.

Hutt was simply wrong about this.  First of all, there clearly IS a conceptual basis for treating contaminants differently because the legal statutes pertaining to their regulation are quite different.  Second, there is a rational basis as well.  It seems obvious that eliminating an artificial sweetener with no nutritive value will not diminish the supply of food.  On the other hand, eliminating peanuts will.  So, a chemical may be “tolerable” as a contaminant even if it is not “acceptable” as an additive.
 
The standard of proof that underlies the older provisions (sections 402 and 406) that still pertain to contaminants are not as clear cut.   However, there is no reason to suppose that “some credible evidence” is the operating standard.  In fact, the FDA has gone to court with that stance many times and lost.  Even though the food industry can be expected to claim that “beyond a reasonable doubt” is required, when and insofar as I was the agency, I always thought “as likely as not” was a very fair standard.   But, I never went to court with it. Regarding 402(a)(1), the other issue that is sometimes troublesome lies in determining what constitutes an “injury”.

Quantitative Interpretations

Translating scientific evidence into legal evidence often requires matching legal standards of proof to quantitative characterizations of probability.  The most straightforward example of this is the use statistical characterizations of uncertainty to decide a toxic tort case where the operating standard is preponderance of the evidence (Black and Lilienfeld, 1984).  If there is a greater than 50% chance that a chemical produced by the defendant caused a disease suffered by the plaintiff then the plaintiff wins.  For most other legal standards there is no established quantitative equivalency.  However, there is strong tendency to equate a probability of 95% with beyond a reasonable doubt, while a probability of 5% is sufficient for a reasonable possibility.  For example, while a NOAEL determination typically uses a 95% threshold to determine whether or not an effect has been observed, and BMDL uses a 5% threshold to determine that particular effect may occur.

The other issue that must be confronted when supplying a quantitative argument for legal or public policy is that the probability in need of quantification often does not have a statistical origin.  This is especially likely to happen with small theoretical risks that cannot be measured with any precision.  Probability trees may be used for this purpose (e.g. Morgan et al, 1980; Evans, et al, 1994; Carrington et al, 2013).  Since assigning probabilities to theories is subjective, at least in part, using probability trees never makes anyone very comfortable, which has undoubtedly limited their use.  Nonetheless, for contentious scientific issues, probability trees are a very useful tool for separating scientific opinion from political opinion.

Judges and Peers

In the courtroom and in public policy, the determination of matters of fact involving scientific issues often defer, at least in part, to expert opinion.  Agency documents that contain influential information are routine subject to interagency review and may also be required to be subjected to nongovernmental peer review as well.  There are, of course, scientific disagreements.  But, in addition, different agencies, different programs within the same agency, and different academic professions often employ different standards of proof.  In a courtroom, those issues are resolved by the law and the court.  In a public policy debate, there often is no mechanism for adjudicating exactly what the standard of proof is.  For example, I think “as-likely-as-not” is a good general purpose standard, but not everyone agrees.

Reviewers with an academic perspective are especially apt to think that the standard of proof enforced within their discipline is the only standard there is.  That goes double or triple for academic disciplines that are strongly associated with public policy like toxicology and nutrition.  In fact, there is a tendency to equate public policy standards with the standards used for publication in academic journals.  That can be disastrous.  The biggest problem is that environmental toxicologists generally look at all chemicals with premarket approval lenses.  As a result, they try to use and impose a standard of proof that is really quite weak, which is especially inappropriate when the government is responsible for building a case (e.g. for 402 or 406).  Another common problem that arises is when it is necessary to balance toxicological risks against nutritional risks (e.g. Olin, 1998; FDA, 2014).  Whereas toxicologists often insist on a very weak standard of proof, nutritionists are inclined to want one that is far closer to certainty.  Even though they may not agree on the standard of proof at all, both toxicologists and nutritionists may still agree that a quantitative analysis that uses as-likely-as-not standard to identify an optimum is an affront to their academic sovereignty.

References

Black B and Lilienfeld DE (1984).  Epidemiologic Proof in Toxic Tort Litigation.  Fordham Law Review 52:Issue 5, Article 2.

Carrington CD, Murray C, and Tao, S. (2013). A Quantitative Assessment of Inorganic Arsenic in Apple Juice

Evans, J.S., Graham, J.D., Gray, G.M., and Sielken, R.L., Jr. (1994).  A distributional approach to characterizing low-dose cancer risk.  Risk Anal 14: 25-34.

Hutt PB (1978).  Unresolved Issues in the Conflict Between Individual Freedom and Government Control of Food Safety.  Food Drug Cosmetic Law Journal 33:558-589.

Morgan MG, Morris SC, Amral DAL, and Rish WR (1984).  Technical Uncertainty in Quantitative Policy Analysis – A Sulfur Air Pollution Example.  Risk Anal 4:201-230.

Olin, SS (1998).  Between a Rock and a Hard Place: methods for Setting Dietary Allowances and Exposure Limits for Essential Minerals.  J. Nutr. 128:364S-367S.


Official Post Soundtrack

Led Zeppelin (1970).  Gallows Pole.  In: Led Zeppelin III, Track 6.

Post Notes

Thesis Post #35.  Back on the Regulatory Toxicology thread.

Sunday, April 26, 2015

The Arsenic Drinking Water Rule

Risk Assessment Can Never Die

Sometimes, rationality is unavoidable.  Arsenic is classified by the International Agency for Research on cancer as “Group 1: The agent (mixture) is carcinogenic to humans”.  Not only that, the primary route of exposure associated with carcinogenesis is from drinking water AND arsenic is a common contaminant in drinking water, even in the United States.  Since arsenic is not intentionally added, getting rid of it costs money.  How much money?  Well, that depends on how much of the arsenic you want to get rid of.  So, at the turn of the century, the EPA needed a real risk assessment that could justify the expenditure of real money. 
  
But there was a hitch; agency guidelines then and now more or less forbade the creation of scientifically plausible risk estimates.  Fortunately, that was also a problem money could solve: The agency hired outside help to characterize the dose-response relationship (Morales et al, 2000).  While the agency guidelines were acknowledged, they weren’t followed.  There was no default option.  Several different model options were explored.  Most of them weren’t very plausible, but at least one of them was – which given the circumstances is quite admirable.  Furthermore, that is the model the economic analysis primarily relied upon, which is very nice too.

The Cost-Benefit Analysis

In addition to making use of cost estimates for reducing arsenic levels in municipal water supplies of varying sizes, the cost-benefit analysis developed to support the 2001 arsenic drinking water rule (EPA, 2000) also estimated the reduction in risk that could be expected  by keeping arsenic concentrations below a series of possible regulatory levels.  Like the dose-response analysis, the economic analysis was done under contract outside the agency.  However, the exposure assessment portion of the risk assessment relied upon he same values used by the agency for other evaluations.  Summary results of risk reduction estimates (from section 5.4.1) are as follows:

Arsenic Level
(µg/L)
Bladder Cancer Cases Avoided Per Year
Lung Cancer Cases Avoided Per Year
3
28.6 – 76.8
28.6 - 61.5
5
25.6 – 55.7
25.6 – 44.5
10
18.7 – 31.0
18.7 – 24.8
20
9.9 – 10.6
8.5 – 9.9

In a separate analysis, the cost of implementing water purification systems that would achieve each specified level was estimated.  When calculated on a per household basis, the variation between purification costs was far more dependent on the size of the system than the target level (see section 6.3.3).  In small systems, per person costs (and water bills) were estimated to increase by over $300 per year.  In large systems serving over a million people, even the most stringent standard was estimated to carry a cost of less than $10 per person.  In addition, the relative cost of achieving lower arsenic levels only increased significantly with systems with more than 1 million people.  These differences are a result of two main factors.  First, there is an economy of scale that makes purification cheaper when to costs is borne by more users.  Second, systems with groundwater sources are more likely to need treatment in order to attain a given level; and smaller systems are more likely to come from groundwater rather than surface water. 

Cost-benefit ratios were calculated on a national basis.  This required assigning a monetary value to each cancer case avoided (aka the “Value of a Statistical Life”; see section 5.4.2) and then comparing the monetized anticipated benefits with the anticipated costs.  Since a benefit to cost ratio of 1 or greater was achieved at either 10 or 20  µg/L, but not 3 or 5 µg/L, a standard of 10 µg/L was adopted as the Maximum Contaminant Level.

An Unsafe Level

Risk assessments are approximately correct, at best.  Cost-benefit analyses are acceptable, at best.  The analysis conducted for the 2001 Arsenic Drinking Water Rule was far from perfect, but it was good enough to justify the regulation, and that wasn’t easy.  It only happened because the analysis did NOT follow current (then or now) agency guidelines that discourage the generation of credible or even plausible risk estimates.  One of the consequences of that is that the MCL for arsenic in drinking water is not “safe”, at least in the sense that the term is operationally defined by the Safety Assessment paradigm.  Nor does the MCL achieve the risk target of 10-4 (1 in 10000) that the EPA typically uses the gauge risks for contaminants in drinking water (EPA, 2012).  Therefore, using the drinking water standard to judge arsenic levels in juice or wine is wholly unjustified.  Furthermore, it is not at all clear that an FDA standard for bottled water should be the same as the EPA standard for tap water; perhaps it would be more economically feasible to just purify water that is actually drunk, as opposed to also purifiying the water is also used for bathing, washing dishes and clothes, and watering the lawn. 

Fifteen years later, just about every component of the economic analysis conducted in 2000 is in need of some revision.  But still, the risk assessment can serve as a useful template for an updated analysis.  In fact, that’s one of the great attributes of risk assessments: They are far easier to improve than safety assessments. However, one aspect of the economic analysis is perhaps in need of restructuring; estimating risk-benefit ratios on a national basis seems ill-advised.  In particular, consider the most extreme differences in costs per household:

Arsenic Level
(µg/L)
Cost Per Household
Systems with <100 Persons
Cost Per Household
Systems with > 1,000,000 Persons
3
$317
$7.41
5
$318.26
$2.79
10
$326.82
$0.86
20
$351.15
$0.15

Even though these numbers aren’t exactly comparable for a number of reasons, it seems obvious that a a cost-benefit analysis tailored to individual water systems would identify different optimum MCLs for different systems.  More specifically, it seems likely that a lower MCL could be justified for large systems, while perhaps even 20 µg/L isn’t worth it for small systems.  If the Federal government were paying for it, then maybe you could call it Environmental Justice.  But they aren’t, and therefore a national standard may be a bad deal for everybody.  [Similar arguments can and have been made for minimum wage standards and health care].

References

EPA (2000).  Arsenic in Drinking Water Rule Economic Analysis. EPA 815-R-00-026.


Morales KH, Ryan L, Kuo TL, Wu MM, and Chen CJ (2000).  Risk of internal cancers from arsenic in drinking water.  Environ Health Perspect. 108: 655–661.

Official Post Soundtrack

Young, Neil (1979).  Hey Hey My My (Into the Black).  In: Rust Never Sleeps, Track 9.


Post Notes

Thesis Post #34.  Part of Risk Management thread.

Thursday, April 23, 2015

Throwing Levels at Contaminants

Squeaky Wheels

The one thing that the Safety Assessment Paradigm and the Risk Assessment Paradigm have in common is that they are both mechanisms for making regulatory decisions.  But generally speaking, they have a different beginning.  A Safety Assessment for a food additive begins with a request for approval.  Minor issues involving individual consignments of food that are currently being detained by the agency are a little like that too; either the food will be allowed on the market or it won’t.  But, the bigger problems that only a risk assessment can deal with begin with an issue that often has an undetermined solution.  New squeaky wheels can pop up in several different ways:
  • Chemistry problems.  A contaminant known to be toxic is discovered at concentrations that were previously unknown.  The best example of this is acrylamide which, as a result of industrial usage, is known to be neurotoxic and carcinogenic.  It turns out that it is often synthesized by cooking food; relatively high levels are found in coffee, french fries, and pretzels.
  • Toxicology problems.  A chemical that occurs in food is discovered to have potential health effects.  This is actually a rare occurrence, and typically only happens when there are health effects in need of explanation.  Problems of this sort are far more likely to happen with immediate effects that occur soon after a food is consumed.  The discovery over fifty years ago that Minamata disease was attributable to methylmercury in fish is a prime example.
  • Hello Again problems.  There are a handful of chemicals that are of recurring interest partly because they are known to be toxic, but also because their notoriety stimulates research interest and consumer advocacy.  Methylmercury is in this category now, along with arsenic, cadmium, and lead.   There are some fungal toxins (e.g. aflatoxin) and industrial chemicals (e.g. dioxins) that belong in this category of issues as well.   It is the recurring issues that are in greatest need of quantitative risk assessment.  But, for that to happen, the political impediment of the shell game must be overcome.

Managing Exposure By Setting Levels

Food additives and pesticides are introduced into the food supply deliberately.  This simple fact means that exposure to them can be, more or less, effectively controlled by setting levels that limit their use.  Contaminants, by definition, are not intentionally added.  That fact makes them more difficult to avoid.  It also means that the federal government has less authority to regulate them than additives or pesticides.  But nonetheless, the management technique that is inextricably associated with the safety assessment paradigm is setting a level.  And therefore, since safety assessment is a far more familiar way of dealing with chemicals in food, the strategy of first resort for limiting exposure to contaminants is to set a level.  Sometimes, that works.  For example, consider the distribution of aflatoxin in Illinois corn between 2010 and 2014 (Illinois Dept of Agriculture, 2015):

Aflatoxin Concentration
2010
2011
2012
2013
2014
<5 ppb
384
387
285
391
392
5 to 20 ppb
2
4
40
6
6
20 to 50 ppb
9
3
30
4
2
>50  ppb
7
3

2

50 to 100 ppb


9


100 to 200 ppb


27


200 to 300 ppb


8


>300 ppb


1



Even though monitoring aflatoxin concentrations is difficult, there is a payoff.  First, lots of corn with levels of aflatoxin that may result in acute toxicity (probably not even with 300 ppb, but it is getting close) will be excluded from both human and animals food.  Second, a few lots that contribute a high percentage of total average exposure will be excluded.  In particular, the 19% of the samples in 2012 over the FDA action level of 20 ppb are responsible for about 87% of the total aflatoxin in the entire set (calculated using median concentrations for each concentration interval). 

Not Managing Exposure By Setting Levels

However, when the effect results from long term average exposure, setting levels often doesn’t really work at all.  The overall exposure will reflect the average concentration in many food items.  Elimination of a few food items with higher concentrations may have little impact on the average concentration.  If the range of contaminant concentrations are narrow (e.g. like Illinois corn in 2010. 2011, 2013, or 2014 rather than 2012), the impact is even more impotent.  For example consider the distribution of arsenic apple juice (from Carrington et al, 2013):



Since there are no apple juice samples with inorganic arsenic concentrations above 10 parts per billion, any limit that is greater than 10 will have little or no impact on the average concentration (with a larger sample size it is likely that there would be some above 10) .  A limit of less than 10 will impact the average concentration somewhat, but the limit could only be achieved by throwing away large quantities of apple juice. 

Arsenic in apple juice is not the only example.  As another example where an analysis of the impact on exposure was conducted, a World Health Organization analysis (JECFA, 2007) concluded that for tree nuts other than pistachios, the presence of any feasible maximum limit would have little effect on dietary exposure to aflatoxin.  Yet, even when they don’t work, levels get set anyway.   Codex Alimentarius, the UN body sets food safety standards for international use routinely sets levels for contaminants after determining that those levels will have little economic impact.

So, more often than not, even though setting a level gives the appearance of doing something, it is often the “doing nothing” option in disguise.  There is definitely something with that, but what it is may be worthy of debate:  Either the authorities are not protecting the public from a risk they need to be protected from, or they are pretending to protect the public from a risk that isn’t worthy of attention.

References    

Carrington CD, Murray C, and Tao, S. (2013). A Quantitative Assessment of Inorganic Arsenic in Apple Juice

Illinois Department of Agriculture (2015).  Mycotoxin survey.

Joint Expert Committee on Food additive and Contaminants (2007).  Aflatoxins: Impact of Different Hypothetical Limits for Almonds, Brazil Nuts, Hazel Nuts, Pistachios and Dried FigsWHO Food Additive Series 59, pp. 306-356.

Official Post Soundtrack

Williams, Lucinda (2014).  Foolishness.  In: Down Where the Spirit Meets the Bone.  Disc 1, Track 7.


Post Notes

Thesis Post #33.  The first in a public health risk management thread.  The soundtrack is for "pretending to protect"
  

Wednesday, April 22, 2015

The science of the People

Basic Sciences

All scientific disciplines have some interplay between what is accepted as established reality and speculative theory.  As might be supposed, the most metaphysical of sciences is physics.  The particles and waves that are subject of study are also the building material for chemistry, biology, and all the many macrosciences.  If physics isn’t real, then what is?  Furthermore, the development of new knowledge in physics is uncommon, which often makes a degree in physics tantamount to a degree in physical engineering.  Speculation about your navel will never change its shape. 

Epidemiology is at the opposite end of the spectrum.  It is the most epistemological science.  Nothing is quite what it seems, perhaps.  An association may be attributable to a butterfly flapping its wings on the other side of the world.  If physical theories are superseded every century or so, epidemiological theories are apt to be trumped sometime next month.  So, compared to physics, epidemiology is a terrible and highly unreliable science.  But, the thing is, the subject matter of epidemiology is often more important.  So, when done well, epidemiology is very useful and informative.  On the other hand, when done badly, voodoo and astrology look far better.  The trick is this: The scientific methodology underlying good epidemiology bears scant resemblance to that of physics.

Peer Review

When the premises are not in dispute, a few knowledgeable judges can assure that the conclusions are correct.  The earth is round.  Oranges are round.  The circumference of a circle is π times the diameter.  The surface area of an orange peel is three times π times the radius squared.   So, orange peels are easy.   On the other hand, apples are not quite so round.  As a result, the judges of apple peel experiments are not quite so indisputable as are the orange peel judges.  And besides, the round earth just might be a government plot.  After all, just whom do the judges work for?

When epistemology rules the roost, the expectation that academic journals can guarantee the veracity of what appears on ink and paper has little foundation.   Yes, a reviewer can be expected to employ the same premises as the author, but if those assumptions are incorrect then the game is off, or at least it is different.   Instead of a few experts, there are many who may harbor an alternative theory that is, in fact, quite plausible.  Common sense is, or at least some of it, is as reliable as expert judgment.

In physics, one plausible explanation is a pretty good trick.  In fact, figuring out what needs to be explained is pretty much the whole story.  When Newton “discovered” gravity, it came by noting that masses attract.  Working out the math was simple after that.  When Newtonian mechanics was supplanted by quantum mechanics, it happened by changing what was being explained.  The reality explained by epidemiology is far more speculative.  The influences that may affect the occurrence of a chronic disease are often numerous, and every one posited is a theoretical explanation of why, at least in part, the disease occurs.  To make matters even more complicated, the different explanations may draw on many different other sciences.  There may be toxicological explanations, nutritional explanations, and socioeconomic explanations with their own underlying biology.  Therefore, unless they are all unusually accomplished scientists with a very broad range of interests, a narrowly defined group of “epidemiological” peers are not going to be adequate for judging how good all the competing explanation really are.  For example, the wrong set of peers might approve of the use of a regression analysis with log-transformed dose.  

The Tyranny of the Journal Article

Publish or perish.  Generate new knowledge now or get out of town.  That’s a quasi-reasonable performance standard for physics or any science with a high fact-to-theory ratio.  But history and epidemiology move too slowly and unpredictably for that.  Recording history is at least as important as drawing conclusions from it.  Similarly, the results from the single study will soon be less notable than the results of the meta-analysis that will ensue when merit is given to the initial conclusions.  Therefore, unless the association is very strong, analyzing the results from a single cohort is hardly worth doing.  Yet, dumping the data into the statistical significance hopper is considered to be an essential part of a career in epidemiology.

If a study is well designed in the first place, the most valuable result is the data.  Collecting what is often very similar data just so a new analysis can be conducted is silly.  In fact, collecting data and analyzing could easily be dissociated into separate efforts that could be funded separately.  A grant given to an investigator to collect data and “publish” it in a public repository is money well spent.  A grant to collect data that is hidden or thrown away after a single analysis is not.  Yes, data collection should be hypothesis driven, but it should not be presumed that the hypothesis a particular investigator has in mind is the only one that the data may be useful for. 

The analysis of pooled data from multiple studies by Lanphear et al (2005) represents a very rare good example of what is possible when data is shared.  However, the investigators from the different studies only agreed to share the data among themselves.  This is especially unfortunate because the analysis primarily relied on a loglinear dose-response model.  When a group of World Health Organzation toxicologists (2011) relied on this analysis, they were obliged to produce a dose-response model from adjusted data, rather than actual data.   

Disease Models

Many of the health outcomes that are the subject of environmental epidemiology are common and are known to be influenced by many variables.  Instead of a new multivariate regression analysis for every study, adjusting for “confounding variables” could be done much more reliably it the adjustments were based on literature collected over many decades and many cohorts.  That would reduce the possibility of a non-causal cohort-specific association, and would encourage more careful consideration of the quantitative cause-effect relationship of all the variables.  The rule of thumb is that a relative risk of at least two is needed to exclude a strong possibility of an association that is not causally related.  But with pooled data and reliable quantitative models for major disease factors, the possibility of quantitative characterization of lesser causal influences would be greatly enhanced.

For example, let’s suppose a study is being developed to characterize the relationship between arsenic and lung cancer.  The main influence on the development of lung cancer is smoking, and therefore, detailed knowledge about the quantitative relationship between smoking and lung cancer would greatly aid the determination of what additional effect arsenic has, and how it does or does not interact with smoking.  But without a long history of recorded observation, that is a pipe dream.

References

Lanphear BP, et al (2005).  Low-level environmental lead exposure and children's intellectual function: an international pooled analysis.  Environ Health Perspect. 113:894-9.

World Health Organisation (2011).  Lead (addendum). Safety evaluation of certain food additives and contaminants.  WHO Food Additive Series 44, pp 381-497.

Official Post Soundtrack

Cranes (2004).  Particles & Waves.  In:  Particles & Waves, Track 5.


Post Notes

Thesis #32.  One more on epidemiology will finish the thread, I think.