Bad Stuff in Food: Risk Analysis and Political Commentary: April 2016

Wednesday, April 27, 2016

Individual Choice

Public Health Value Judgments

When you can’t have it all, which is pretty much all the time, it is necessary to set priorities. Other people (e.g. friends, relations, employers, and the government) are often there to help you set your priorities – whether you want them to or not. But still, everybody does have to make their own choices on occasion. For example, unless your mom, spouse, or religion chime in, you can choose what fish you eat and how much all by yourself. You probably already know whether you like fish or not, and you also probably know how much it costs. If there are other factors that go into the decision, then you will need to know what they are.

Unfortunately, the trend in public health these days is give consumers food consumption advice without exactly saying why. There is no good reason for this that I know of, but I am aware of two of the bad ones. First, not getting in to the gritty details avoids political controversy stemming from scientific uncertainties. That doesn’t mean the advice in necessarily bad, but then again maybe it is. Second, doling out public advice can be a career all by itself, and career advisers often care more about protecting their jobs than whether or not the advice is sensible. So, at best, food consumption advice is an expression of the social values of the people giving advice - which may or may not correspond to your values. At worst, the advice doesn’t reflect anyone’s valuation at all. As a result, distrusting public health advice is generally a pretty good idea, especially when the advice isn’t accompanied by some intelligible reasons for it, which will also permit you decide for yourself if those reasons are good enough for you.

Grading on the Curve

While psychology studies are sometimes grounded in physiology with physical measurements (e.g. nerve conduction velocity), most epidemiological studies concerned with neurobehavioral development largely employ batteries of tests that reflect the social science interface of psychology. Since the value of these tests is subjective, they are standardized by determining how subjects “normally” perform. Since variation in performance on tests is normal, the tests are typically given numerical values that reflect how far above or below average a score is relative to much it normally varies. There are a two major problems with this. First, how much a test score varies isn’t necessarily a good indicator of how much performance on the test really matters. Second, defining what a “normal” population can be rather arbitrary. For example, the Denver Developmental Screening Test was originally standardized in Denver, while the Boston Naming Test was developed in Boston. Yet, what is normal in Denver may be somewhat different from what is normal in Boston. For a book length discussion of the problems with standardized testing, see Gould (1981). Nonetheless, standardized psychological test batteries are more objective and reproducible than a doctor’s or a teacher’s opinion, and they are widely used for that reason.

The most basic normalized scale used for psychological testing is the Z-score where the difference between the average and test score is divided by the standard deviation. As a result, a -1 signifies a test result that is one standard deviation below the average, while a value of +1 signifies a test result that is one standard deviation above average. Other standardized tests are often scaled with modified with modified Z-scores. In particular, the Intelligence Quotient (IQ) is scaled with a mean is defined to be 100 and the standard deviation is 15, while Scholastic Aptitude Tests (SAT) have a mean of 500 and a standard deviation of 100. The following table compares how test results are scaled with each method:

	2 SD below	1 SD below	Average	1 SD above	2 SD above
Z-Score	-2	-1	0	1	2
IQ	70	85	100	115	130
SAT	300	400	500	600	700

Individual Neurobehavioral Risks and Benefits Arising From Fish Consumption

So, let’s talk about fish. The point of the preceding discussion is that there is evidence that the consumption of fish during pregnancy may have both bad (from methylmercury) and good (from omega-3 fatty acids or perhaps something else) effects on future neurobehavioral performance of the child – and the effects aren’t exactly the same. Plus, I’m not going to tell you what you should do. I am leaving that to be your problem, and since it really isn’t a simple decision I have no idea what you will decide. However, I will do my best to supply some reliable information.

My main vehicle for information delivery is an Excel-based program, which is a slightly modified version of a risk-benefit assessment model that I developed while I was at the FDA (2014). Although there are some other minor modifications as well, the main difference is that this version of the model is intended to estimate risks for a specific individual. However, if you don’t have Excel, or it is more trouble than it is worth, here are some sample results that give a feel for what the program does:

Consuming a high mercury such as swordfish fish twice a week during pregnancy will result in a developmental delay of the age at which a toddler learns to walk of about a week. The uncertainty associated with this estimate ranges from 0 to about three weeks.
Consuming one can of albacore tuna and one can of albacore tuna once a week during pregnancy will result in an increase of about 2.5 IQ points in the child. However, there may be a decrement in IQ of about 0.3 points, or the increase may be as much as 3.5 points.
Consuming salmon once a week will result in a projected increase in performance on the Verbal SAT of about 25 points. However, given the many uncertainties association wit the estimate, the increase may be as little as 0 or as much as 35 points.

Software

This Excel macro, Personal_Seafood_Net_Effect_Estimator.xls, integrates four components presented earlier:

Personal Fish Consumption Estimator

Methylmercury Toxicokinetic Conversions

Methylmercury Dose-Response

Fish Benefit Dose-Response

References

Gould, SJ (1981). The Mismeasure of Man. W. W. Norton & Company.

U.S. Food and Drug Administration (2014). Quantitative Assessment of the Net Effects on Fetal Neurodevelopment from Eating Commercial Fish (As Measured by IQ and also by Early Age Verbal Development in Children).

Official Post Soundtrack

Ponty, J-L (1983). Individual Choice. In: Individual Choice, Track 5.

Post Notes

Thesis Post #63, and the fifth of a series of five.

Sunday, April 17, 2016

Dear Journal Editors

Rejected

Thank you very much for reviewing my manuscript entitled “Plausible In, Plausible Out: A Bootstrap Methodology to Characterize the Uncertainty Associated With Dose-Response Modeling”. I am disappointed in the result, of course. However, as it met the same fate as every other paper on model uncertainty that I have sent to the various and sundry editors that the Journal has had over the last 25 years, I am not terribly surprised.

I think I will not attempt to rewrite the paper to make it more acceptable. The paper says what I want it to say and I don’t think any of the major suggestions made in the review will make it any better. A similar example to that discussed in the manuscript is also in the FDA (2016) assessment on arsenic in rice released two weeks ago (see section 9.4), and I suppose my purposes will be better served by working the rest of the text into my other writing projects. Nonetheless, for the benefit of my colleagues who are interested in model uncertainty in general and this paper in particular, I would like to address some of the comments made in the course of the review.

A Methodology Paper

The paper I submitted is a discussion of two methodological developments used in USFDA assessments for arsenic in apple juice (Carrington et al, 2013) and rice (USFDA, 2016). The first method involves the use of a parametric bootstrap simulation to propagate the uncertainty associated with dose estimation into the characterization of the dose-response relationship. Putting error bars on the doses is a novel technique and as just about everyone I know thinks it is a pretty good idea, this was the impetus for writing the paper in the first place. Although the reviewers didn’t seem to think this technique was remarkable, at least they didn’t object. I guess it really is a pretty obvious thing to do once you’ve thought of it. The second reviewer did suggest that additional details be provided about the input distributions for the dose estimates. I decided not to do that when I wrote the paper because I thought getting into specifics would be a distraction from the more general idea of allowing dosimetric uncertainties to be represented in a dose-response analysis. If I get around to reworking the analysis for some other purpose, I will heed those suggestions to the extent that I can; many of the issues raised by the second reviewer resulted from the necessity of working with published summary data rather than observations from individual subjects. However, for a methodological presentation where no importance is attached to the actual results at all, I don't think any of that matters.

Against the wishes of potential FDA coauthors, I also chose to include a discussion of model uncertainty. Since this was and is the hot button political topic for any risk assessment involving arsenic and any other chemical hazard worthy of attention I thought it would be a serious omission to not include it, especially since there may often be an interaction between dosimetry and empirical weighting of alternative models. As near as I can tell, the reviewers haven’t raised any serious objections to anything I said about model uncertainty, but it is very clear that they really don’t like the way I said it. This may be partly due to the fact that the reviewers didn't take the discussion in the methodological context that was intended, but since this seems to be the reaction I always get when I try to talk about model uncertainty, I think there is more to it than that. I think I understand the nature of this editorial issue far better than I used to, so I will take this opportunity to explain.

Unfinished Science

Model uncertainty is subjective. Some people have it and others don’t. I think the two basic causative factors are as follows:

The model has to be thought of as a theory. Even if it is only approximately correct, the mathematical model has to convey some truth that is not evident from isolated observations.
There has to be more than one model-theory. If only one model is under consideration, then there is no uncertainty.

Taken together, these two criteria basically mean that if you are afflicted with model uncertainty then you are thinking like a scientist. But here’s the thing; if you have it then you can’t really talk or write about model uncertainty in the objective third person writing style generally preferred by governments and journal editors. You can describe a model uncertainty as a psychological phenomenon as I just have, but that is pretty much the end of the third person road. If you think it is just me who can’t write properly, go back and read the most widely cited paper on model uncertainty ever written (Hill, 1966): It is written almost entirely in the first or second person. For example, the problem the paper sets out to solve is stated as follows:

Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?

Part of the problem is that scientists don’t write their papers in the same manner as they converse in private. When papers are written, it is often because model uncertainties have been resolved and what were once just theories are reported to be objective realities. But risk analysts can’t do that. There are many model uncertainties that have not been resolved and they may never be, which leaves us with probability trees and subjective weight-of-the-evidence evaluations.

My Problem

The apple juice and rice risk assessments both used probability trees to depoliticize arguments over which dose-response model “should” be used to characterize the causal relationship between inorganic arsenic and cancer. I am happy to report that this strategy worked as I hoped that it would. The two assessments employed somewhat different strategies for assigning probabilities to alternative models. Because I think it is more consistent with how scientists actually think, I prefer the strategy used in the rice assessment that largely relies on expert opinion.

My only reservation is that the probabilities used for the rice assessment relied only on my opinion. As an expert, assigning probabilities to theories was implicitly part my job description at the FDA, and I’m not complaining about having to do what I was paid to do. I have a PhD in Pharmacology and long experience in modeling dose-response relationships, so I don’t feel unqualified. However, my primary area of expertise is neurotoxicology rather than cancer biology, and I am far more familiar with the literature on lead and methylmercury than arsenic. So, it would have been nice to have other experts involved, especially if the stakes are raised from just setting guidance values for apple juice and infant cereal that have relatively little economic impact to suggesting that consumers modify their rice intake. But in order move from the subjective “I” to the intersubjective “We”, I think that those of us who are afflicted with model uncertainty need to be permitted to write in the same way as we converse among ourselves, and we can’t do that if we are forced to pretend to objectivity that we really don’t have.

References

Carrington CD (unpublished). Plausible In, Plausible Out: A Bootstrap Methodology to Characterize the Uncertainty Associated With Dose-Response Modeling.

Carrington CD, Murray C, and Tao, S. (2013). A Quantitative Assessment of Inorganic Arsenic in Apple Juice.

Hill, Sir Arthur Bradford (1965). The Environment and Disease: Association or Causation? Proc Royal Soc Med 58:295-300.

U.S. Food and Drug Administration (2016). Arsenic in Rice and Rice Products Risk Assessment.

Official Post Soundtrack

Green Day (1997). Reject. In: Nimrod, Track 14.

Post Notes

Thesis Post #64. Even though I will probably group it in the arsenic series, this post is really about academic politics.

Tuesday, April 5, 2016

Arsenic in Rice: Another Bloody Election

Toxic Endpoint Election

The basic idea of the risk assessment paradigm is that it has two steps (NRC, 1983). In the first step, the risk assessors produce the best information they can about a particular hazard that they have identified. In the second step, the risk managers take that information and do their best to manage the risk for the benefit of the public. But the risk assessment paradigm has an evil twin. The risk propaganda paradigm begins with identification of an issue that the managers would like to be in control of for the benefit of themselves and their friends. The initial process is followed by a risk caressment process, where the analysts produce a result that justifies a decision that has already been made. Perhaps the most well-known example of the risk propaganda paradigm is the GW Bush White House using weapons of mass destruction (WMD) as a pretext for the invasion of Iraq.

Unlike the apple juice risk assessment, the FDA (2016a) risk assessment for arsenic in rice contains a section on noncancer endpoints. The National Institute on Environmental Health Sciences (NIEHS) has been funding many “low-dose” epidemiology studies over the last 15 years, there are many new reports in the literature of associations between arsenic and many different toxic endpoints. However, since there has been an emphasis on studies on women’s and children’s health, that is what most of the resulting publications have been concerned with. The problem is that the modus operandi in epidemiology these days is to conduct a post-hoc analysis that yields a statistically significant result, write a paper, and then call up the university press office to report the association to the public. But it’s not an association or statistical significance that matters, it’s causality. And most environmental epidemiology studies these days don’t seem to be designed or analyzed with the goal of demonstrating causality.

Fortunately, the National Academy of Sciences (2013) recently reviewed the literature on arsenic and sorted the various potential endpoints into “tiers” that grouped them based on the strength or weight of the evidence for a causal relationship. The noncancer endpoint that fell into the top tier was cardiovascular disease. The reason for that is pretty simple; while there are multiple studies showing a dose-response relationship between inorganic arsenic and cardiovascular disease, there is little to be found beyond statistical significance for the other endpoints. So, if you want a quantitative assessment for a noncancer endpoint, cardiovascular disease is pretty much the only choice.

But, that’s not what the FDA did. They skipped over cardiovascular disease and went for the Tier 2 and Tier 3 endpoints associated with pregnancy and childhood development. The fact that there were no data to support a quantitative assessment was found to be lamentable, but it left them undeterred. That might seem unfathomable, but once you understand that it’s the risk propaganda paradigm at work, some reasons for it aren’t that hard to come up with: a) NIEHS wants to justify funding studies that aren’t really very useful, b) the Society of Toxicology wants to create jobs for women who understand pregnancy and children so much better than men do, c) issues concerned with environmental effects on pregnant women and children are part of the Democratic party platform, and d) all of the above. I think d is the correct answer, but it's mostly a.

Risk Manglement

In its purest form, the risk propaganda paradigm isn’t used very often. That is because it often ends up with lies that are obviously not true. Propaganda works much better when it is at least partially true. The WMD ruse is a pretty good example. Iraq was invaded, but as it turned out, the WMDs just weren’t there. The noncancer risk assessment for arsenic in rice is like that too. The table of contents suggests that there is a risk assessment, but if you look inside it’s just not there. No quantitative risk assessment, no safety assessment, just an exposure assessment.

Yet the guidance issued by the FDA (2016b) relies primarily on the noncancer nonassessment anyway. Perhaps that is because the quantitative assessment indicates that, as with apple juice and as usual, setting levels are not a very effective way to reduce the risk from naturally occurring contaminants. If your management strategy isn’t going to work, then maybe it is better to use a nonassessment that doesn’t show anything at all. But, the cancer assessment really is a better justification. The evidence for an effect is stronger, there is some basis for quantifying how big the effect might be, and there is substantial evidence that exposure earlier in life is more important. The FDA survey indicated that levels of arsenic in infant cereal are actually higher than rice in general, and there is no good reason for that. A “37 percent reduction in lifetime cancer risk attributable to brown-rice infant cereal consumption” isn’t much, but it’s something.

In addition to proving a guidance level for inorganic arsenic in infant cereal, the FDA (2016c) also suggests that infants and pregnant women should modify their rice consumption. Again, the cancer risk assessment is arguably sufficient justification for not using rice cereal as the major staple for infants. But, the advice to pregnant women rests solely on the dubious and unquantifiable Tier 2 and Tier 3 effects. The review quoted by the FDA as justification for that focus also lists cadmium, copper, iron, manganese, and zinc as potential concerns in addition to arsenic (Wright and Bocarelli, 2007). What about them? If simply declaring a causal relationship without any consideration of the relationship between dose and health outcome is sufficient for consumer advice, then what are pregnant women supposed to eat? Acrylamide – well there goes the bread. Water is out too – hyponatremia can cause severe brain damage. Perhaps they can still eat cake.

But, never mind the infants and pregnant women, what about the rest of us? The guidance document says that

The FDA did not find a scientific or public health basis to recommend that the general population of consumers change its rice consumption based on the presence of arsenic.

What the hell happened to the Tier 1 effects? Lifetime exposure to arsenic is still causing lung and bladder cancer, right? What about those cardiovascular effects that the FDA skipped over? What about the fact that adult males are exposed to about twice as much arsenic from rice products (it’s beer) as adult females? How come we don’t get advice too? I’m going to drink the beer anyway, but still I’d like to feel needed.

I’d just blame the Center for Food Safety and Applied Nutrition for being so stupid, but I also know that they bear only proximal responsibility for this gross insult to the intelligence of every woman, man, and child in the United States. I figure NIEHS, the rest of HHS, the EPA, the Society of Toxicology, the Democratic party, and the bankers who supplied campaign funds for a voting bloc also helped set the propaganda campaign in motion.

References

National Research Council (1983). Risk Assessment in the Federal Government: Managing the Process. National Academy Press, Washington, DC.

National Research Council (2013). Critical aspects of EPA’s IRIS assessment of inorganic arsenic: interim report.

U.S. Food and Drug Administration (2016a). Arsenic in Rice and Rice Products Risk Assessment.

U.S. Food and Drug Administration (2016b). Supporting Document for Draft Guidance for Industry on Inorganic Arsenic in Rice Cereals for Infants: Action Level.

U.S. Food and Drug Administration (2016c). For Consumers: Seven Things Pregnant Women and Parents Need to Know About Arsenic in Rice and Rice Cereal.

Wright, RO and Baccarelli A. (2007). Metals and neurotoxicology. The Journal of Nutrition 137: 2809–2813.

Official Post Soundtrack

Killing Joke (1996). Another Bloody Election. In Democracy, Track 10.

Post Notes

Thesis Post #61 and part two of two on the FDA Arsenic in Rice Risk Assessment. You may notice that in the previous post I refer to my former agency as "we", but as "them" in this one. That's because the agency I used to belong to doesn't exist anymore.

Monday, April 4, 2016

Arsenic in Rice: Just One Victory

Breaking the Mold

When arsenic in apple juice became a public issue in 2011, the FDA needed a risk assessment. It was generally presumed by agency management that the toxicologists who worked for the Center for Food Safety and Applied Nutrition should produce a proper risk assessment; meaning an assessment that conforms to standardized EPA guidelines. But, there was a problem: The most recent EPA (2005) cancer risk assessment guidelines don’t even begin to work for a naturally occurring toxicant like arsenic. If they did, the EPA wouldn't still have a cancer slope factor that hasn’t been updated since 1988. If they did, the EPA wouldn't have needed to contract out the cost-benefit analysis for the 2001 Arsenic Drinking Water Rule. If they did, the FDA wouldn’t be in the position they were in.

So, for arsenic in apple juice we did something else instead (Carrington et al, 2013). A similar analysis was released last week on the topic of arsenic in rice (FDA, 2016). While the dose response analysis in those assessments doesn’t conform to the 2005 EPA guidelines, it is consistent with the general notion of separate risk assessment and risk management processes (i.e. NRC, 1983), and it is also consistent with the less prescriptive 1986 EPA cancer risk assessment guidelines. The most significant departure of the FDA analyses from those guidelines is that they use a probability tree (another old idea) to characterize the uncertainty with the extrapolation from high to low doses. The most obvious result of using this technique is that it does a more comprehensive job of characterizing the uncertainty associated with generating low-dose estimates from high-dose observations. But, the more important advantage is that it changes the choice of what dose-response model “should” be used for making a regulatory decision into a different issue from its historical counterparts:

NRC 1983 and 1986 EPA guidelines. A default model was justified by policy unless a scientific argument could be presented for deviating from it. The problem with this was that toxicologists could never make a scientific argument that was certain enough for the policy default to be overturned
2005 EPA Guidelines. The policy default became mandatory. Scientific arguments were prohibited by the Point of Departure.
Arsenic in Rice Assessment. While the use of a probability tree means absolute certainty is not required, scientific arguments are mandatory. Using a probability tree doesn’t eliminate the possibility of political bias, at least it doesn;t require it. Furthermore, probability trees at least make it possible to frown upon self-interested biases as they occur.

The subjective weights and probabilities in the rice risk assessment are my own. While no one frowned upon my potential political biases, no one who reviewed the risk assessment expressed an opinion about how the alternative models were weighted either, nor did they suggest alternative models that might be used instead. That’s a shame that I attribute to the fact that the 2005 guidelines essentially shut down the market for theoretical reasoning in toxicology. Instead, the current fashion seems to be that a statistical analysis will resolve all the uncertainties with a purely empirical approach – which means Toxicology has largely been supplanted by Epidemiology. I beg to differ.

Another, more novel, technique introduced in the apple juice and rice assessments is that both analyses use a parametric bootstrapping technique (akin to a Monte-Carlo simulation) to represent the uncertainties in the dose estimates used to characterize the dose-response relationship. As with the probability tree, there are two advantages. First, there is better characterization of the uncertainty associated with the dose-response relationship. Second, (and again, more importantly) it is no longer necessary to decide when the dose estimates for human epidemiology studies are “good enough” to proceed with a characterization of a dose-response relationship.

The Devils in the Detail

Using a probability tree to raise the lid on Toxicology reveals a wide array of wriggling quantitative issues. One of the reasons the EPA guidelines have proscribed relatively simple dose response models is because choosing a model for political reasons (e.g. to be precautionary) is only possible when there is a clear relationship between a scientific assumption made and its regulatory implications. With a simple model that choice can often be made irrespective of any other scientific issues. With more complicated models that may not be true. For example, whether or not a nonlinear dose-response model is “more conservative” (i.e. yields a higher risk estimate) may depend on what the dose is.

The impetus for precautionary assumptions has led to the notion that in toxicology science and policy are inextricably linked. For example, the FDA (2014) fish risk-benefit analysis contains several tables of ‘assumptions vs implications’ that were introduced because the Office of Management and Budget thought they were necessary to explicate potential biases. But, sometimes the assumptions were essentially indisputable or had no clear political implications associated with them – so we tried to make some implications up. In fact, the linkage between science and policy isn’t real at all – it’s done by design. The biology underlying cancer and other diseases is very complicated, and the dose-response models typically used to generate estimates are approximations at best. Trying to manipulate them to achieve a predetermined result tends to be obvious to anyone who isn’t doing the same.

But, make no mistake; the biology is complicated. While the dose-response function used to characterize the risk from arsenic in rice serves as a nice exemplar of what cancer guidelines could be, the assessment itself is far from perfect. There are many scientific issues yet to be addressed, some which are unique to arsenic. For example:

The models used in the rice assessment are standard models that are in EPA Benchmark Dose modeling software. Those are perfectly adequate for characterizing the range of what the risks associated with exposure to arsenic might plausibly be, but I would hesitate to say that they are up to the task of characterizing what the risk is most likely to be – no matter how the alternative models are weighted. That problem might be addressed by adding one or more complex biological (i.e. toxicokinetic/toxicodynamic) models to the probability tree, and perhaps eliminating the models there now.
The dose-response models in the apple and rice risk assessments are based on the analysis of the single cohort. While the single studies chosen were the best available, a more concerted effort should develop a dose-response model that is reasonably consistent with all reported observations.
There is evidence that exposures earlier in life are more important. The rice dose-response model dealt with that issue by just focusing on exposures under the age of 50, but a model that parameterizes the temporal component of the cause-effect relationship would be far superior (i.e. a time-to-tumor analysisof some sort). At least theoretically, it should be possible to produce a model that is consistent with both the data from Taiwan (that is better for characterizing the influence of dose), and the results from Chile (that are more useful for characterizing the influence of age at time of exposure). However, Individual subject data would be necessary to do that.

There are a host of other nagging scientific details that could be dealt with; but they only become important when the goal is to produce estimates that scientifically defensible, as opposed to simply conforming to a default procedure justified solely by agency policy. The EPA has far more resources to devote to dose-response analyses than the FDA did. If they can awaken from the dogmatic slumber induced by their own guidelines, perhaps they can do much better.

References

Carrington CD, Murray C, and Tao, S. (2013). A Quantitative Assessment of Inorganic Arsenic in Apple Juice.

U.S. Environmental Protection Agency (1986). Guidelines for Carcinogen Risk Assessment. EPA/630/R-00/004

U.S. Environmental Protection Agency (2005). Guidelines for Carcinogen Risk Assessment. EPA/630/P-03/001F.

U.S. Food and Drug Administration (2016). Arsenic in Rice and Rice Products: Risk Assessment Report

Official Post Soundtrack

Rundgren, Todd (1973). Just One Victory. In: A Wizard, A True Star, Track 19.

Post Notes

Thesis Post #60. It also the first of a two part commentary on the Arsenic in Rice and Rice Products: Risk Assessment Report released by the FDA last week. In addition, it is an executive summary of many other blog topics, which is why there are many links to other essays.