Bad Stuff in Food: Risk Analysis and Political Commentary: 2016

Sunday, August 14, 2016

Individual Fish Risk Benefit Model

This page is set up as an adjunct to the discussion in The Science-Policy Shell Game concerning fish consumption advice. I may replace the Excel macro linked here now with something prettier and/or easier to use at some point in the future.

Software

This Excel macro, Personal_Seafood_Net_Effect_Estimator.xls, integrates four components presented earlier:

Personal Fish Consumption Estimator

Methylmercury Toxicokinetic Conversions

Methylmercury Dose-Response

Fish Benefit Dose-Response

Sunday, July 10, 2016

SPSG #13: Ending the Game

This chapter outlines strategies for overcoming the difficulties noted in earlier chapters. First, adopting some common legal strategies for separating Matters of Fact from Matters of Law could do wonders. The creation of job positions for Science Judges whose sole responsibility is to disentangle science matters from policy matters could facilitate that. Second, while eliminating the science-policy shell game entirely is probably not possible, there is no reason why it should be condoned or institutionalized. Therefore, the EPA assessment guidelines for cancer and noncancer endpoints both need to be rewritten. As the face of the Safety Assessment Paradigm, the Reference Dose background document should be rewritten to make it clear that the product of the assessment is a regulatory policy rather than a statement of scientific fact. As for the cancer risk assessment guidelines, instead of enshrining the default option with the Point of Departure, the guidelines should use probability trees to make the default option going away entirely. Furthermore, there is no reason to the restrict the use of quantitative risk assessment to just cancer. However, solving that problem will create another problem: At least in public health, a legacy of the institutionalized use of the science-policy shell game has virtually eliminated risk management as a federal job position. So, doing a risk assessment is of very little use if no one in the federal government has the responsibility for managing issues. That can happen, but position descriptions will need to be rewritten. Finally, research should be funded to support science, instead of supporting technocratic shell games. In particular, the enterprise of environmental epidemiology needs to be redesigned. Statistical significance testing should be eliminated as the primary means of drawing conclusions from data, and studies should be designed to increase or reduce evidentiary weight accorded to causal theories instead. Perhaps most importantly, observational data needs to be shared. When it comes to analyzing data, regardless of what their source of funding is, investigators cannot be given complete deference in conveying what the data infer. As an academic recommendation, teaching Statistics and Probability as separate subjects would clear up more than a few nagging philosophical problems. Finally, the facade of impersonal scientific objectivity needs to be abandoned. Scientists should be both free to speculate and humble enough to admit their theories may be wrong.

SPSG #12: Personal Technique

This chapter is largely written in the first person, and it does so largely for the purpose of disparaging the concept of scientific objectivity. It starts out by describing how several reorganizations dramatically affected the branch at the USFDA that I worked in for 25 years. In the end, it was swallowed up by the shell game. It then goes on to discuss the importance of recognizing the subjective nature of science, particularly when the science is unsettled and uncertain. The objectivity facade is partly attributable to scientific writing style that takes the author out of discussions of factual issues, which hides tha fact that personal opnions are beign expressed. To demonstrate what science is really like, I walk through the personal choices I made in developing the dose-response model for arsenic and lung cancer that was used for the apple juice and rice risk assessments discussed in Chapter 11. Some of those choices were done by committee and some were not, but either way they all involved subjective scientific judgements made in a fog of uncertainty. In one case, I made a different choice that I had previously because new information influenced my subjective judgment about how to go about estimating lifetime risks from a prospective epidemiological study, which underscores the notion that “objective” reality evolves with scientific inquiry. The resulting dose-response model is then used to provide risk estimates for someone with a high-end (for the United States) arsenic intake. In addition to providing the lifetime risk estimates that were also given in the FDA reports on apple juice and rice, estimated changes in average life expectancy are also provided. For the purpose of making an individual choice, the latter measure is far more meaningful. The chapter then suggests that the inability of EPA to provide a dose-response characterization for arsenic may stem from a wrongheaded demand for objectivity that dictates the use of the wrong probability and the wrong personnel for a job that needs statistical theory instead of statistical probability.

Saturday, July 9, 2016

SPSG #11: The Technocracide

Since it kills the Safety Assessment Paradigm every time, it has always been very clear that arsenic would never make it as a food additive. Although arsenic commonly occurs in food as a contaminant, the concern for arsenic in food was always mitigated by the fact that the largest exposures have generally been from drinking water. That equation changed in 2001, when the EPA passed a regulation for arsenic in drinking water that changed that equation. Because the drinking water rule required a cost-benefit analysis, the decision process was supported by a risk assessment that produced risk estimates; in spite of the guidelines, it was consistent with the risk assessment paradigm. However, the Office of Water did find it necessary to hire outside consultants to accomplish that goal. In any case, arsenic exposure from water was reduced and arsenic in food became a relatively bigger issue as a result. As a result of public attention in 2011, the FDA issued guidance "action" levels for arsenic in apple juice in 2013 and rice in infant foods in 2016. From a risk management standpoint, both efforts were abject failures. Although risk assessments were produced, they didn't really support the guidance in either case. At least part of the reason is that setting levels usually isn't an effective way to manage risks from contaminants. Preventing something from getting into the food in the first place can be far easier, but that isn't always possible. Nonetheless, the FDA went ahead anyway with action levels anyway. In the case of apple juice, it isn't too hard to figure out why; the FDA commissioner publicly promised an action level before the risk assessment was done. The reasoning that went into the rice guidance is more mysterious. Even though there is nothing in the risk assessment to indicate that they are uniquely susceptible to arsenic, the FDA advised both infants and pregnant women to reduce their rice intake, but gave no advice for anyone else. In fact, the exposure assessment indicated that the greatest exposures to arsenic from rice are in adult males. On a more positive note, the FDA cancer risk assessments for both apple juice and rice solved the default option problem by using probability trees to represent the theoretical probability associated with the dose-response relationship for arsenic and both lung and bladder cancer. The main lesson to be learned from those exercises is that regardless of how well an assessment represents current science, if the message takes precedence over the result, there will be no reason to expect public health to improve.

SPSG #10: The Paradigm War

Methylmercury in fish has been a major issue for both the EPA and the FDA since several epidemics occurred in Japan and Iraq in the 60s and 70s. At about the same time (early 90s) as the FDA started quantifying risks for methylmercury in fish and issuing consumer advice for commercial seafood, the EPA started giving recreational fish consumption advice based on the EPA Reference Dose (RfD). In 1999 congress asked the National Academy of Science (NAS) to evaluate the RfD for methylmercury, and a report was issued in 2000. The fact that congress even asked the question of the NAS cemented in many people minds that the RfD was and is a statement of a scientific fact. That meant that if it was true for EPA then it had to be true for FDA too, and as a result any attempt to quantify the risks and provide information about what the risks came to be viewed as a political attempt to undercut the science. By 2004, the FDA and EPA had agreed to give joint advice for fish consumption, but there was no agreement about what the basis or the rationale for the fish advisory was. While the EPA thought the RfD was paramount, the FDA chose to pursue a quantitative strategy that balanced benefits and risks; so the fish-risk benefit assessment was an FDA-only affair. But perhaps the most important difference was about what information, if any, would be given to consumers. The RfD treats consumers in the same manner as it treats agency managers; it decides for them, and as a result there is no basis for providing the information to consumers that will let them decide for themselves. As an alternative, a few representative risk estimates are provided for the consumption of fish during pregnancy using a version of the risk assessment model developed for the FDA that is designed to estimate risks for individual consumers.

Tuesday, July 5, 2016

SPSG #9: A Practical Guide to Theoretical Probability

This Risk Analysis methodology chapter is the applied version of the philosophical discussion of probability presented in Chapter 2. It also fixes the flaws in the Redbook paradigm discussed in Chapter 4, resulting in the Guillotine paradigm. It begins with a discussion of characterizing uncertainty when there are both statistical and theoretical probabilities involved. While a theoretical probability does not need to be quantified when it is the only probability involved or when there is no decision at stake, giving it the same epistemic standing as a statistical one is unavoidable when both matter. However, that does not mean a theoretical probability can be used as if it were a statistical probability. A theoretical probability is perhaps true always or perhaps false always; is it not true sometimes and false at other times. The discussion then turns to the problem of assigning probabilities to alternative theories. Declaring that all sum to one is a simple matter, but deciding the probability of each theory is not. Since theoretical probabilities are subjective, depending on the opinions of those who have one (that usually means experts) is in some way is inevitable. However, instead of asking experts to assign probabilities to theories directly, there are advantage to garnering opinion in the form of evidential weights, where each alternative theory is evaluated more or less independently. Although formal weight-of-the-evidence schemes have been developed for many regulatory purposes, they are not usually thought of as quantitative exercises. However, it has been done and could be done better. It is also argued that WoE analysis and dose-response modeling need to be more tightly integrated, especially when the judgment that there is a causal relationship becomes more likely than not. First, the shape of the dose-response relationship may influence the judgment that there is a causal relationship. Second, the last vestiges of causal uncertainty may not matter if the estimated risks are too low to matter or high enough to be a concern even if they are only probable.

Monday, July 4, 2016

SPSG #8: The Wrong Probability

This chapter is about the problems associated with using statistical probability as the only probability, especially in epidemiology. While different scientific disciplines typically rely on somewhat different collections of convincing arguments, the conduct of epidemiology can aptly be compared to a trial for murder. Since the issue is causality, theoretical probability is front and center. Yet, at least when designing studies and publishing studies, environmental epidemiology studies often rely on tests of statistical significance testing for drawing conclusions. Arthur Bradford-Hill disparaged this practice over 50 years ago, and he is still quite right; statisticians are using the wrong probability. But that isn't the only problem. Epidemiologists (or their statisticians) often treat measures designed to quantify strength of association for the purpose of arguing causality as if they were measures of effect; thereby completely missing the point of having them in the first place. Next, epidemiologists are often reluctant to share raw data. While there are many possible explanations for this practice, the fact that other analysts would be able to use the data to explore and support theories not utilized in the published report is chief among them. The data sharing problem becomes especially evident when the theories used in published analyses are obviously wrong, either when they are first published, or perhaps later. This more or less forces the court of scientific opinion to rely on hearsay evidence. As an example of that, the use of log transformed measures of dose in multivariate regression analyses is discussed. Since it is an established analytical procedure, there is a tendency to think of regression analysis as a "theory-free" analysis that provides conclusions that are largely empirical. But, that isn't true at all. Linear regression analysis presumes that the quantitative dose-response relationship is linear. Similarly, doing a linear regression analysis with the log of dose presumes that the quantitative dose-response relationship is loglinear. But that results in a supralinear function where not only do the effects get bigger as the dose gets smaller, the effect approaches infinity as the dose approaches zero. Even though that's quite impossible, the practice continues, and that is probably because testing a theory that is definitely wrong is a reliable way of producing scary statistically significant low dose effects.

SPSG #7: The Sociology of Technocracy

This chapter is like a sociology of science essay, except that it is really about politicians in lab coats, with most of it being concerned with toxicology. Many of the roots of the SPSG can be found in academia. First, there is a discussion of the Information Quality Act of 2002 that sought to make the information used by the federal government more objective. Yet the Office of Management and Budget interpreted "objectivity" as meaning "peer reviewed". While that could potentially subject scientific claims to cross-examination by outside experts, without separation of science and policy, peer review can also be used to prevent cross-examination altogether. While toxicology initially was primarily associated with the drug industry, it has become increasingly concerned with environmental regulation. The growth of environmental toxicology programs that are almost entirely concerned with government as a career path are a prime example. As a result, environmental toxicologists can potentially complete their careers by only talking among themselves. Although the Society of Toxicology did not have a Code of Ethics when it was formed in 1961, it does now. Many of the recommendations seem to be political statements that on closer examination aren't necessarily ethical at all. In particular, members are required to be "advocates of public health" and "Abstain from professional judgments influenced by undisclosed conflict of interest". Both of these statements favor the interests of public sector members (i.e. technocrats) over those with private interests. Nutritionists can be technocrats too, and when nutrients are also toxic, that can create a clash of technocratic cultures. While nutritionists don't use safety factors, when it comes to considering dose-response relationships, their tradition is quite limited, perhaps by design. In the “Risk Analysis Paradogm”, Risk Communication is often recognized as a third component of the regulatory decision process, along with Risk Assessment and Risk Managment. However, the roots of the discipline lie in the study of consumers responses, which makes it well suited for selling a decision that has already been made. Quantification can be part of the SPSG too. Because of their ability to seemingly automate a decision process, often by ignoring or assiduously hiding theoretical probability, statisticians can play the shell game too. Since they often equate the behavior of scientists with science, sociologists sometimes seem to backhandedly endorse the shell game.

Note

Although it wasn't my original intent, this chapter does read like a populist manifesto.

Sunday, July 3, 2016

SPSG #6: Two Charades

While the earlier chapters are largely historical, this is the chapter that begins to speak of current practice, and it is also gives the book its title. Quite simply, the Science-Policy Shell-Game (SPSG) is a technocratic game played by treating science and policy as if they were interchangeable in both directions. It is comprised of two components. On the one hand, a statement is purported to be science in front of a political audience. On the other hand, the same statement is purported to be policy in front of a scientific audience. As the end result, a regulatory decision is shielded from both scientific and political scrutiny. Although there were many examples of it before, the SPSG was deliberately institutionalized by a committee of EPA technocrats in 1986. The object of their creation was the EPA Reference Dose (RfD). As an example of the Safety Assessment Paradigm, the RfD was no different from the ADI, except for one thing: It was claimed to be a scientific fact. As a further technocratic assault, the 2005 EPA Cancer Assessment Guidelines replaced plausible worst-case estimates with what amounted to a ban on theoretical reasoning. That was accomplished by interposing a "Point of Departure" between toxicological theory and regulatory decision making. That made no sense then, and it still doesn't. Why the SPSG is played is debatable; it may be some combination of agency managers hiding decisions they would rather not defend, scientists who don't want to cede regulatory control to agency managers or elected officials, statistical decision theorists who claim to make decision processes objective, or it may just be career maintenance and research dollars. But whatever the reasons are, the SPSG is antiscientific and antidemocratic. It cuts off scientific discussion from policy making with technocratic short cuts, and it leads scientists and the public to believe that the decisions faced by the government and themselves are far simpler than they really are. The “Fifth Branch” is introduced as a term to describe the technocrats inside and outside of the federal government who play the SPSG.

SPSG #5: Dose-Response Theory

This chapter is a compendium of pharmacology and toxicology theory, and since it doesn’t build on any of the previous chapters, it is essentially a third introductory chapter. Although does get a bit technical, it is designed to give a sense of what the quantitative issues are without delving into mathematics. Although it isn’t necessary for most of the later chapters, it is provided as background material for some of the discussions involving theoretical probability in some of the later chapters, especially eight through eleven. The chapter commences with a survey of basic concepts including biochemical mechanisms underlying the interaction of a toxic chemical with a biological molecule, and toxicokinetic theory that describes what happens before it gets there. There is also discussion of statistical theories like probit analysis that treat the causal issue as a problem of describing how much the dose required to produce a given effect varies in a population. There are also hybrid or two-dimensional models that describe the both magnitude of individual effect and population variation as well. It has long been recognized that there is a temporal component to dose response relationships that can vary between both chemicals and effects; yet the temporal component is often ignored. All said, biology is complex and dose-response theory is imperfect; there is plenty of room for improvement. Nonetheless, one thing seems clear; the dose makes the poison. Effects tend to get bigger as the dose gets bigger, but not necessarily in proportion.

Saturday, July 2, 2016

SPSG #4: The Risk Assessment Paradigm

This chapter is about the Risk Assessment Paradigm (RAP) that was originally introduced to chemical safety in food and the environment as a mechanism for dealing with carcinogens in food. The advantages of the RAP were touted by a 1983 report from the National Academy of Sciences (NAS) that is often referred to as the Redbook. In particular, the RAP was widely heralded as a democratic alternative to the technocratic SAP. This was accomplished by distinguishing a risk assessment process that characterizes what the risks are and a risk management process that uses the information from the risk assessment to make a decision. However, the Redbook version of the RAP had some significant flaws that kept it from really accomplishing the goal of separating science from policy. Therefore, this book also uses a more basic definition of RAP called "the Guillotine Paradigm", that is predicated on the separation of "Is" from "Ought": The scientific discussion of what is known is a separate endeavor from the policy decision of what ought to be done. The problems with the Redbook were identified in subsequent NRC reports. First, the Redbook paradigm begins with a Hazard identification step. Since this is to be done by scientific experts, it gave them control over what questions are to be answered. That problem can be addressed by viewing the RAP as an iterative process, where the policy deliberation identifies the questions, and the risk assessment provides the answers. Veiwing the RAP as a dialogue istead of a monologue opens the decision process up to democratic participation. Secondly, the Redbook embraced the notion of the “default option”, where theoretical probabilities were to be resolved by giving one theoretical alternative preference as a matter of regulatory policy. That problem can be solved with a probability tree that acknowledges the theoretical alternatives. Thirdly, the Redbook paradigm was derived from procedures used at the FDA for dealing with cancer and the Delaney Clause, and it was therefore sometimes interpreted as being applicable only to cancer. Although the RAP is named after the analytical component, the fact that policymaking (aka Risk Management) is recognized as a distinct process is at least as important. The RAP is especially indispensable when regulatory decision making requires a rational process where the risk is balanced against something else, such as a benefit, another risk, or the cost of avoiding it.

SPSG #3: The Safety Assessment Paradigm

This chapter is about the procedure originally developed for the premarket approval of pesticides and food additives, which is referred to throughout the book as the Safety Assessment Paradigm. The SAP gave birth to the Acceptable Daily Intake (ADI), which was the daily exposure to a chemical that would be considered acceptable by the FDA. The ADI was originally calculated by dividing the highest dose found to not have an observable (statistically significant) effect in a laboratory study by a safety factor of 100. There are three key features of Safety Assessment that are especially worth noting. First, since the SAP delegates the regulatory decision to experts, it is thoroughly technocratic. Second, premarket approval and the SAP were designed to be precautionary; a chemical couldn't be used until it was shown to be safe. Third, it is presumed that the way to limit exposure to a chemical is the set a level that the government will consider to be acceptable. Although the SAP has evolved from its 1954 introduction, it still retains its unmistakable premarket approval origins.

Friday, July 1, 2016

SPSG #2: Two Probabilities and Frequency Too

This is a history of philosophy of science presentation, and it sets up the terminology used throughout the rest of the book: If the first chapter is about "safety", this one is about "probability". There are two very different concepts of probability, both of which are very old and generally familiar. From an etymological standpoint, the legal concept of probability dates back to Roman law. In common parlance, the evidential form of probability is being used whenever a proposition is said to be "probably true". Even though it wasn't called probability until Pascal and friends gave it that name in the 17th century, the concept of chance and its relationship to frequency of occurrence has been around since Aristotle. In common parlance, the probability of chance is being used whenever it is said that an event will "probably happen". In scientific and technical literature, it is the second "statistical" meaning of probability that is used almost exclusively, and that is almost certainly because it is more objective in an empirical sense. However, the other subjective form of probability, which is called "theoretical" or "evidential" probability in the book to emphasize its role in science, can still be found in scientific discussion all the time, but it usually appears in the words rather than the numbers.

There are two other important concepts introduced in this chapter as well. First, as a result of the statistical definition of probability, uncertainty and frequency of occurrence are often treated as identical concepts. That is incorrect both from a grammatical standpoint, and because even without the appendage of "probability", statistical theories concerning frequency of occurrence are important in many scientific disciplines. For example, in public health, regulatory issues often revolve around evidential probabilities of statistical theories. Second, abstract representations of probability are also sometimes referred to as probability themselves without reference to usage, which leads to yet another opportunity for semantic confusion. As they are taught in introductory courses concerned with probability and statistics, mathematical probability distributions that can be used to represent either frequency of occurrence or the probability of chance are well known. However, even though theoretical probability is clearly a matter of degree, it is much harder to quantify. A probability tree can be used to represent theoretical probability and provide a quantitative interpretation as well. The basic concept is very simple: The probability of all alternative theories or hypotheses under consideration sums to one. Scientific evidence may then be weighed in order to determine which hypotheses, and to what degree, are more probable than the others. Causal relationships are the most common issue in which scientific issues involving theoretical probability arise. For example, whether or not a particular chemical causes cancer, and if so how, involves theoretical probability. Although they have evolved somewhat, guidelines for establishing causal relationships in science have been around for centuries. They have been around in law for even longer than that.

Notes

Perhaps this should be the first chapter, but in the interest of not straining credulity at the outset I have put it second.

After having been at the FDA for several years, I ran into my first model uncertainty problem. I traipsed over to the Office of Mathematics to inquire about the proper methodology for calculating the probability of a model being true. I asked a statistician who had been at the FDA for thirty years, and the answer I got astounded me:

You are not allowed to ask that question

I obtained a second opinion from a younger statistician, just trying to find out about how to go about identifying the best model among several and got an answer that I found to be no more satisfactory:

Find a biologist and beat it out of them

As a biologist who did not want to be beaten, I soon embarked upon a philosophy of science reading binge that lasted several years in the mid 90's. The main thing I learned is that there is another probability that is quite different from the one the statisticians were using. However, in the end I was unable learn very much about it from the philosophers. This chapter is a compilation of the few gems that I managed to gather from my survey. Perhaps not surprisingly, some of the most important insights came from practicing scientists and risk analysts. In any case, I managed to answer my own question, at least to my own satisfaction. Although beating the probabilities out of the biologists can be a fair characterization of the method, the beating can be avoided with a willing confession. The description of that solution begins in this chapter and finishes in chapter 9.

However, I ran into other difficulties subsequently. It seems that not everyone wanted the problem to be solved at all. The rest of the book is about that.

SPSG #1: Food Law and Chemical Safety

This is an historical review of food law concerned with chemical safety. It commences with a discussion of the 1906 Pure Food and Drug Act and ends with the Dietary Supplement amendments of 1994. All told, there are about a half dozen statues that pertain to the regulation of chemicals in food by the U.S. Food and Drug Administration (FDA) that differ in many ways. While virtually all of the laws governing the safety of chemicals in food require scientific interpretation, the way in which scientific expertise is to utilized to create a legal definition of what is considered safe or not by the agency necessarily differs among different statutes. The statutes differ in the definition of harm, the burden of proof, and in their evidentiary standards. For the purposes of the rest of the book, the most important distinction is between additives and contaminants. Food additives are deliberately added to food, have an intended use, must be approved by the FDA before they can be used, and the burden of proof lies on the manufacturer. Because the approval process is structured and planned, the way in which scientific expertise is utilized also occurs in a somewhat predetermined manner. The evidentiary standard for arguing that an additive has not been shown to be safe is also very low; it must only be argued that there is substantial evidence that harm is possible. Contaminants are, by definition, present in food unintentionally and many occur naturally. Therefore, they don't have an intended use, they don't need to be approved, and the burden of proof is on the government to show that the contaminants is harmful enough to be worthy of regulation, and the judicial branch often makes the final determination of what will be considered "safe". Other classes of chemicals fall soemwhere between those two extremes. In particular, food additives in use prior to the Federal Food Drug and Cosmetic Act amendments of 1958 were exempted from the approval process. This created a class of chemicals that are “Generally Recognized as Safe” where FDA approval is obtained by demonstrating history of use rather than going through a rigorous testing regimen. As the end result, there is no consistent definition in either law or science about what the word "safe" really means.

Thursday, June 30, 2016

The Book

The Science-Policy Shell Game

Most of the previous discussion on this blog has been assembled into a ebook that is better organized and written than my more casual blog essays. I have sprinkled a few new ideas into it as well.

Unfortunately, it isn't free. Maybe that's because I think I can trick people into thinking it is worth reading by making them pay for it. However, you can read the preface and summary for free, which should enable you to make and informed choice about whether or not it is worth five bucks to read the rest of it:

US Link, Search the title on Amazon elsewhere.

Chapter Links

I keep hoping that this blog will turn out to be a place where I can discuss the issues near and dear to my heart. Towards that end, the following links are provided for comments and discussion of individual chapters:

Chapter 1: Food Law and Chemical Safety

Chapter 2: Two Probabilities and Frequency Too

Chapter 3: The Safety Assessment Paradigm

Chapter 4: The Risk Assessment Paradigm

Chapter 5: Dose-Response Theory

Chapter 6: Two Charades

Chapter 7: The Sociology of Technocracy

Chapter 8: The Wrong Probability

Chapter 9: A Practical Guide to Theoretical Probability

Chapter 10: The Paradigm War

Chapter 11: The Technocracide

Chapter 12: Personal Technique

Chapter 13: Ending the Game

Wednesday, May 11, 2016

Biological Problems

The Perils of Multivariate Linear Regression

As is often the case in the epidemiological literature on environmental influences on neurobehavioral development, Bowers and Beck (2006) noted that a paper by Lanphear et al (2005) “has suggested the existence of a supra-linear dose–response relationship between environmental measures such as blood lead concentrations and IQ”. They then produced an analysis that indicates that the apparent supralinearity is an artifact resulting from the way the data were analyzed. They stated their conclusion as follows:

Results of the analyses show that a supra-linear slope is a required outcome of correlations between data distributions where one is lognormally distributed and the other is normally distributed.

While their mathematical analysis was indubitably correct, the way Bowers and Beck reported the results left something to be desired. How the data are distributed is not really the issue at all. Instead, the mathematical artifact they found results from conducting linear regression analyses with log transformed data. If data from a normal distribution, or any other distribution, were log transformed prior to the regression analysis, then the same result would be obtained. Furthermore, as demonstrated by Jusko et al (2006), a linear regression without log transformation with data drawn from a lognormal distribution does not result in a supralinear dose-response relationship.

To their credit, Hortung et al (2006) also understood that the real issue is the shape of the dose response relationship rather than the distributions that either the dependent or independent variables follow. They therefore protested that Lanphear et al (2005) had considered the likely shape of the curve before conducting the regression analysis:

The shape of the exposure–response relationship was determined to be nonlinear insofar as the quadratic and cubic terms for concurrent blood lead were statistically significant (p < 0.001 and p = and 0.003, respectively). Because the restrictive cubic spline indicated that a log-linear model provided a good fit to the data, we used the log of concurrent blood lead in all subsequent analyses of the pooled data.

But, there are many problems with this justification. First, it is not at all clear how a spline analysis specifically supports a log-linear model, as opposed to other potential nonlinear models (e.g. a Hill function). Second, there was no consideration of biological plausibility. Like Bowers and Beck (2006), Lanphear et al (2005) seem to think establishing a causal relationship is a mathematical problem rather than a biological one. Third, they did not consider the possibility that other covariates might explain the apparent nonlinearity. Yet, off they went, and a dose-response model that predicts infinite large effects as the dose approaches zero was the inevitable result. For all practical purposes, Bowers and Beck (2006) were entirely correct.

Besides the fact that a loglinear dose-response model is a very poor theory, there is a more general lesson to be learned: A multivariate regression with assumed quantitative relationships between the variables being modeled is highly prone to error. While a loglinear relationship is obviously wrong, a linear relationship isn’t necessarily right either. Correlations between variables may result in attribution of mismodeled causal effects to a variable that has no causal effect at all. For example, if the relationship between socioeconomic status (i.e. the HOME score) and IQ is nonlinear with bigger impacts with low scores and negatively correlated with exposure to an environmental chemical, the some of the socioeconomic effect will erroneously appear to be a low dose effect attributable to the environmental chemical. There are many other possible explanations as well, all of whicih are more probable than a dose response model than predicts incremental effects to get bigger as the dose gets smaller.

Process vs. Theory

Biological complexity often makes the pronunciation of definitive truths doe Medicine and Public Health practically impossible. While relying on expert opinion is a common solution to that problem, that solution does not work well when opinion is divided. As a means of coping with that problem, institutional decision making processes often employ structured evaluation systems to sort through what can often be a voluminous set of scientific literature. The Safety Assessment methodology that is typically used for premarket approval evaluations is an example. This description of Evidence-Based Medicine conveys the general ethos of such efforts:

Whether applied to medical education, decisions about individuals, guidelines and policies applied to populations, or administration of health services in general, evidence-based medicine advocates that to the greatest extent possible, decisions and policies should be based on evidence, not just the beliefs of practitioners, experts, or administrators. It thus tries to assure that a clinician's opinion, which may be limited by knowledge gaps or biases, is supplemented with all available knowledge from the scientific literature so that best practice can be determined and applied. It promotes the use of formal, explicit methods to analyze evidence and makes it available to decision makers.

There are two key concepts at work here. First, the “beliefs of practitioners, experts, or administrators” are getting kicked to the curb in favor of “evidence”. If you thought the beliefs of experts were based on scientific evidence, then you were misinformed, apparently. Secondly, there is an emphasis on the “use of formal, explicit methods”, which also serve to limit subjective influences on the evaluation process.

Experts are not always trustworthy, so the desire for a transparent process is entirely understandable. But, getting a trustworthy process to replace the experts is easier said than done. The process has to be designed by somebody, and that usually means experts. There is also apt to be a negotiation process involved in getting the process to be accepted, so subjectivity isn’t really completely avoided. But perhaps the bigger problem is that trying to deal with complex biological issues with a formula may often be rather stupid. If all the studies show the same result, then it really isn’t going to matter whether the decision making process is expert-based or evidence-based. If the results are different, then the systematic review may succeed at identifying the higher quality studies and grading the general result. But, it won’t explain why the results are different. It won’t figure out why a treatment may works sometimes, but not others. That will take biological theories, and like the biases the evidence-based systems strive to avoid, those are subjective. There are likely to be different theories, of course, and then the experts will inevitably get into a debate over which are more likely. But, guess what, that’s the way science works: Trying to eliminate all potential bias with formulae will also eliminate scientific progress.

By all means, more transparency is needed. In particular, let’s not trust authors to have the last word on how the data they have collected are analyzed and published. Medical researchers and epidemiologists are notorious for not sharing original data involving human subjects, even when they are legally required to do so (Panhuis etal, 2014; Longo and Drazen, 2016). That will allow better theories to flourish, and poor theories to flounder.

References

Bowers TS and Beck BD (2006). What is the meaning of non-linear dose-response relationships between blood lead concentrations and IQ? Neurotoxicology 27:520-4.

Hornung R, Lanphear B, Dietrich K. (2006). Response to: “What is the meaning of non-linear dose–response relationships between blood lead concentrations and IQ?”. Neurotoxicology 27:635

Jusko TA, Lockhart DW, Sampson PD, Henderson CR Jr., and Canfield RL (2006). Response to: “What is the meaning of non-linear dose–response relationships between blood lead concentrations and IQ?”. Neurotoxicology 27:1123–1125.

Lanphear BP, Hornung R, Khoury J, Yolton K, Baghurst P, Bellinger DC, Canfield RL , Dietrich KN, Bornschein R, Greene T, Rothenberg SJ,8, Needleman HL, Schnaas L, Wasserman G, Graziano J,13 and Roberts R. (2005). Low-Level Environmental Lead Exposure and Children’s Intellectual Function: An International Pooled Analysis. Environ Health Perspect. 113: 894–899.

Longo DL and Drazen JM (2016). Data Sharing. N Engl J Med 374:276-277.

Panhuis WG van, Paul P, Emerson C, Grefenstette J, Wilder R, Herbst AJ, Heymann D, and Burke DS (2014). A systematic review of barriers to data sharing in public health. BMC Public Health 14:1144.

Official Post Soundtrack

Jackson, J (1980). Biology. In: Beat Crazy, Track 9.

Post Notes

Thesis Post #65. This covers some of the same ground as Toxicology Meets Epidemiology, but with a more philosophical overview.

Thursday, May 5, 2016

Mixed Probability Calculations

Probability is the Guide of Life

For personal decisions, theoretical uncertainty is the far more familiar form of probability. If two different sources of information lead to different courses of action, then you have to either decide who and what to trust or hedge your bets. However, the probability of chance that is amenable to a mathematical treatment and is the main form found in academic discourse can be important too. The relative importance of the two probabilities can vary with the problem. Sometimes one or the other will dominate, while in other instances both are important. Recreational betting games serve as an example:

Roulette. Betting on a roulette wheel is purely a game of chance. The odds and a long term expected return can be calculated very accurately. Well, unless the game is fixed.
Horse Racing. In theory, some horses are faster than others – chance has very little to do with who wins. Sure, historical records are important, but that’s mainly because they indicate which horses are fast and which ones are not.
Poker. The odds that a certain card or cards will turn up can be calculated, and the game of poker can be simply played as a game of chance. But good poker players also take the mannerisms of their opponents into account when they bet, which turns poker into a mixed probability game.

It’s the last category of problems that make risk analysis interesting.

Betting on the Single Instance

If you are betting on a single instance (i.e. what to do now), then boiling down theoretical probability and statistical probability into a single judgment or number is essential. A simple equation will suffice to represent this notion:

pTotal = pTheory * pChance

If the roulette wheel is fair, then pTheory =1, and therefore pTotal is dominated by calculating the odds. If the fastest horse wins, then pChance is 1, and pTotal is dominated by pTheory. When gamblers bet on a horse, converting horse theory to a numerical value is exactly what they do. Poker players have a tougher calculation – not only do they have to know the odds of a card turning up, they also have to assign a probability to the notion that bluffing will work, or that their opponent is bluffing.

Betting on the Series

But once the bet becomes about the long run, or about public health instead of an individual, then the calculation is quite different. It’s a two dimensional problem where the primary goal is to predict the frequency of a result or different results, and there will also be uncertainty about estimated frequencies. The probability calculation isn’t the same any more. The probability of chance is often a statistical frequency instead. In fact, it may or may not be a theoretical frequency. For example, there can be a range of statistical estimates that range from purely empirical to purely theoretical. An historical record with a large number of observations may justify a frequency estimate with no theoretical uncertainty. On the other hand, a fewer number of observations may serve to support a statistical theory instead, which begets theoretical uncertainty. The frequency calculation is now a function instead of a single number, so the relationship between theoretical probability and the frequency of occurrence is now something far more complicated:

p(Frequency) = pTheory(pChance)

Empirical observations may also be used to disprove a theory too. For example, a large number of observations may show a particular die to be unfair. The again, there may only be enough data the favor one theory over another without being able to conclusively decide that one is indubitably correct. That means you are going to need a probability tree.

Quantifying Theoretical Probabilities

Frequentist probability schemes tend to acknowledge theoretical uncertainty (e.g. as “systematic error”), but then go on to ignore it. On the other hand, Bayesian probability schemes typically treat theoretical and statistical probabilities interchangeably. If you are betting on a single instance, that works reasonably well. Updating a theoretical prior with data can gradually transform the probability into one of chance – the more data there are, the less the theory matters. But it isn’t really very scientific. If they were used to discriminate among alternative theories, the data might be put to better use. That problem is even more critical for the estimation of long run frequencies. Updating the parameter estimates for a model that has been proven to be wrong doesn’t make much sense.

Since it really is more consistent with how scientific knowledge is developed, explicitly assigning probabilities to theories is a better strategy for long-term issues where knowledge may be expected to progress. Since theoretical probabilities are inherently subjective, it is hard to improve upon convening a panel of experts to weigh the scientific evidence. Even if the experts don’t get it quite right, or they aren’t the right sort of experts, the process of assigning probabilities to competing theories creates an occasion for scientific discussion. As long as no one thinks that probabilities assigned to theories are the gospel truth, it’s all good in my book.

As a recent example, Trasande et al (2015) provided an overview of the efforts to characterize the theoretical probabilities for causal theories involving potential health effects of Endocrine Disrupting Chemicals (EDCs):

We now describe the general methods used to attribute disease and disability to EDCs, to weigh the probability of causation based upon the available evidence, and to translate attributable disease burden into costs. During a 2-day workshop in April 2014, five expert panels identified conditions where the evidence is strongest for causation and developed ranges for fractions of disease burden that can be attributed to EDCs.

I have more than a few quibbles with exactly what they did, ranging from how the problems were characterized in the first place (i.e. by presuming independent attributable risks), the use of implausible dose-response models, the lack of serious consideration of other (i.e. non-EDC) causal factors, and the relationship between association and causation is all-or-none. Also, because the probability assignments are subjective, a two-day workshop of experts with similar interests is not really sufficient for a decision involving the economic impacts that are alleged, so I don’t recommend taking these estimate as the last word. However, praise for the process is well deserved. Nonetheless, as it pertains to the present topic of discussion, there is one error in how the theoretical probability was employed after it was arrived at that must not go unnoticed:

Finally, recognizing that attributable cost estimates were accompanied by a probability, we performed a series of Monte Carlo simulations to produce ranges of probable costs across all the exposure-outcome relationships, assuming independence of each probabilistic event. Separate random number generation events were used to assign 1) causation or not causation, and 2) cost given causation, using the base case estimate as well as the range of sensitivity analytic inputs produced by the expert panel. To illustrate with an example, for an exposure-outcome relationship with an 80% probability of causation, random values between 0 and 1 in each simulation led to the first step, which either assigned no costs (random value ≤ 0.2) and costs (random value > 0.2).

If the problem required the combination of both theoretical and statistical probabilities, the use of the probability tree in a Monte-Carlo simulation would be appropriate. However, there is a problem in implementation that arises from the fact that a causal probability is NOT a probability of chance: A theory is either true all the time or false all the time, and the entire cost estimate is dependent (so, no you can’t assume independence) on the truth of the theory. So, using a causal probability to calculate the probability of an event is inappropriate. Instead, the logic should go like this: Since all of the causal probabilities have a probability of less than 95%, the lower bound cost estimate of all of the end points should be zero (see table four). For those endpoints with a causal probability of less than 50%, the central estimate should be zero as well.

Reference

Trasande L, Zoeller RT, Hass U, Kortenkamp A, Grandjean P, Myers JP, DiGangi J, Bellanger M, Hauser R, Legler J, Skakkebaek NE, and Heindel JJ (2015). Estimating Burden and Disease Costs of Exposure to Endocrine-Disrupting Chemicals in the European Union. J Clin Endocrinol Metab 100: 1245–1255.

Official Post Soundtrack

Cars, The (1978). All Mixed Up. In: The Cars, Track 9.

Post Notes

Thesis Post #64. If someone can figure out a way to short their bet on all those IQ points, I'm all in.

Wednesday, April 27, 2016

Individual Choice

Public Health Value Judgments

When you can’t have it all, which is pretty much all the time, it is necessary to set priorities. Other people (e.g. friends, relations, employers, and the government) are often there to help you set your priorities – whether you want them to or not. But still, everybody does have to make their own choices on occasion. For example, unless your mom, spouse, or religion chime in, you can choose what fish you eat and how much all by yourself. You probably already know whether you like fish or not, and you also probably know how much it costs. If there are other factors that go into the decision, then you will need to know what they are.

Unfortunately, the trend in public health these days is give consumers food consumption advice without exactly saying why. There is no good reason for this that I know of, but I am aware of two of the bad ones. First, not getting in to the gritty details avoids political controversy stemming from scientific uncertainties. That doesn’t mean the advice in necessarily bad, but then again maybe it is. Second, doling out public advice can be a career all by itself, and career advisers often care more about protecting their jobs than whether or not the advice is sensible. So, at best, food consumption advice is an expression of the social values of the people giving advice - which may or may not correspond to your values. At worst, the advice doesn’t reflect anyone’s valuation at all. As a result, distrusting public health advice is generally a pretty good idea, especially when the advice isn’t accompanied by some intelligible reasons for it, which will also permit you decide for yourself if those reasons are good enough for you.

Grading on the Curve

While psychology studies are sometimes grounded in physiology with physical measurements (e.g. nerve conduction velocity), most epidemiological studies concerned with neurobehavioral development largely employ batteries of tests that reflect the social science interface of psychology. Since the value of these tests is subjective, they are standardized by determining how subjects “normally” perform. Since variation in performance on tests is normal, the tests are typically given numerical values that reflect how far above or below average a score is relative to much it normally varies. There are a two major problems with this. First, how much a test score varies isn’t necessarily a good indicator of how much performance on the test really matters. Second, defining what a “normal” population can be rather arbitrary. For example, the Denver Developmental Screening Test was originally standardized in Denver, while the Boston Naming Test was developed in Boston. Yet, what is normal in Denver may be somewhat different from what is normal in Boston. For a book length discussion of the problems with standardized testing, see Gould (1981). Nonetheless, standardized psychological test batteries are more objective and reproducible than a doctor’s or a teacher’s opinion, and they are widely used for that reason.

The most basic normalized scale used for psychological testing is the Z-score where the difference between the average and test score is divided by the standard deviation. As a result, a -1 signifies a test result that is one standard deviation below the average, while a value of +1 signifies a test result that is one standard deviation above average. Other standardized tests are often scaled with modified with modified Z-scores. In particular, the Intelligence Quotient (IQ) is scaled with a mean is defined to be 100 and the standard deviation is 15, while Scholastic Aptitude Tests (SAT) have a mean of 500 and a standard deviation of 100. The following table compares how test results are scaled with each method:

	2 SD below	1 SD below	Average	1 SD above	2 SD above
Z-Score	-2	-1	0	1	2
IQ	70	85	100	115	130
SAT	300	400	500	600	700

Individual Neurobehavioral Risks and Benefits Arising From Fish Consumption

So, let’s talk about fish. The point of the preceding discussion is that there is evidence that the consumption of fish during pregnancy may have both bad (from methylmercury) and good (from omega-3 fatty acids or perhaps something else) effects on future neurobehavioral performance of the child – and the effects aren’t exactly the same. Plus, I’m not going to tell you what you should do. I am leaving that to be your problem, and since it really isn’t a simple decision I have no idea what you will decide. However, I will do my best to supply some reliable information.

My main vehicle for information delivery is an Excel-based program, which is a slightly modified version of a risk-benefit assessment model that I developed while I was at the FDA (2014). Although there are some other minor modifications as well, the main difference is that this version of the model is intended to estimate risks for a specific individual. However, if you don’t have Excel, or it is more trouble than it is worth, here are some sample results that give a feel for what the program does:

Consuming a high mercury such as swordfish fish twice a week during pregnancy will result in a developmental delay of the age at which a toddler learns to walk of about a week. The uncertainty associated with this estimate ranges from 0 to about three weeks.
Consuming one can of albacore tuna and one can of albacore tuna once a week during pregnancy will result in an increase of about 2.5 IQ points in the child. However, there may be a decrement in IQ of about 0.3 points, or the increase may be as much as 3.5 points.
Consuming salmon once a week will result in a projected increase in performance on the Verbal SAT of about 25 points. However, given the many uncertainties association wit the estimate, the increase may be as little as 0 or as much as 35 points.

Software

This Excel macro, Personal_Seafood_Net_Effect_Estimator.xls, integrates four components presented earlier:

Personal Fish Consumption Estimator

Methylmercury Toxicokinetic Conversions

Methylmercury Dose-Response

Fish Benefit Dose-Response

References

Gould, SJ (1981). The Mismeasure of Man. W. W. Norton & Company.

U.S. Food and Drug Administration (2014). Quantitative Assessment of the Net Effects on Fetal Neurodevelopment from Eating Commercial Fish (As Measured by IQ and also by Early Age Verbal Development in Children).

Official Post Soundtrack

Ponty, J-L (1983). Individual Choice. In: Individual Choice, Track 5.

Post Notes

Thesis Post #63, and the fifth of a series of five.

Sunday, April 17, 2016

Dear Journal Editors

Rejected

Thank you very much for reviewing my manuscript entitled “Plausible In, Plausible Out: A Bootstrap Methodology to Characterize the Uncertainty Associated With Dose-Response Modeling”. I am disappointed in the result, of course. However, as it met the same fate as every other paper on model uncertainty that I have sent to the various and sundry editors that the Journal has had over the last 25 years, I am not terribly surprised.

I think I will not attempt to rewrite the paper to make it more acceptable. The paper says what I want it to say and I don’t think any of the major suggestions made in the review will make it any better. A similar example to that discussed in the manuscript is also in the FDA (2016) assessment on arsenic in rice released two weeks ago (see section 9.4), and I suppose my purposes will be better served by working the rest of the text into my other writing projects. Nonetheless, for the benefit of my colleagues who are interested in model uncertainty in general and this paper in particular, I would like to address some of the comments made in the course of the review.

A Methodology Paper

The paper I submitted is a discussion of two methodological developments used in USFDA assessments for arsenic in apple juice (Carrington et al, 2013) and rice (USFDA, 2016). The first method involves the use of a parametric bootstrap simulation to propagate the uncertainty associated with dose estimation into the characterization of the dose-response relationship. Putting error bars on the doses is a novel technique and as just about everyone I know thinks it is a pretty good idea, this was the impetus for writing the paper in the first place. Although the reviewers didn’t seem to think this technique was remarkable, at least they didn’t object. I guess it really is a pretty obvious thing to do once you’ve thought of it. The second reviewer did suggest that additional details be provided about the input distributions for the dose estimates. I decided not to do that when I wrote the paper because I thought getting into specifics would be a distraction from the more general idea of allowing dosimetric uncertainties to be represented in a dose-response analysis. If I get around to reworking the analysis for some other purpose, I will heed those suggestions to the extent that I can; many of the issues raised by the second reviewer resulted from the necessity of working with published summary data rather than observations from individual subjects. However, for a methodological presentation where no importance is attached to the actual results at all, I don't think any of that matters.

Against the wishes of potential FDA coauthors, I also chose to include a discussion of model uncertainty. Since this was and is the hot button political topic for any risk assessment involving arsenic and any other chemical hazard worthy of attention I thought it would be a serious omission to not include it, especially since there may often be an interaction between dosimetry and empirical weighting of alternative models. As near as I can tell, the reviewers haven’t raised any serious objections to anything I said about model uncertainty, but it is very clear that they really don’t like the way I said it. This may be partly due to the fact that the reviewers didn't take the discussion in the methodological context that was intended, but since this seems to be the reaction I always get when I try to talk about model uncertainty, I think there is more to it than that. I think I understand the nature of this editorial issue far better than I used to, so I will take this opportunity to explain.

Unfinished Science

Model uncertainty is subjective. Some people have it and others don’t. I think the two basic causative factors are as follows:

The model has to be thought of as a theory. Even if it is only approximately correct, the mathematical model has to convey some truth that is not evident from isolated observations.
There has to be more than one model-theory. If only one model is under consideration, then there is no uncertainty.

Taken together, these two criteria basically mean that if you are afflicted with model uncertainty then you are thinking like a scientist. But here’s the thing; if you have it then you can’t really talk or write about model uncertainty in the objective third person writing style generally preferred by governments and journal editors. You can describe a model uncertainty as a psychological phenomenon as I just have, but that is pretty much the end of the third person road. If you think it is just me who can’t write properly, go back and read the most widely cited paper on model uncertainty ever written (Hill, 1966): It is written almost entirely in the first or second person. For example, the problem the paper sets out to solve is stated as follows:

Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?

Part of the problem is that scientists don’t write their papers in the same manner as they converse in private. When papers are written, it is often because model uncertainties have been resolved and what were once just theories are reported to be objective realities. But risk analysts can’t do that. There are many model uncertainties that have not been resolved and they may never be, which leaves us with probability trees and subjective weight-of-the-evidence evaluations.

My Problem

The apple juice and rice risk assessments both used probability trees to depoliticize arguments over which dose-response model “should” be used to characterize the causal relationship between inorganic arsenic and cancer. I am happy to report that this strategy worked as I hoped that it would. The two assessments employed somewhat different strategies for assigning probabilities to alternative models. Because I think it is more consistent with how scientists actually think, I prefer the strategy used in the rice assessment that largely relies on expert opinion.

My only reservation is that the probabilities used for the rice assessment relied only on my opinion. As an expert, assigning probabilities to theories was implicitly part my job description at the FDA, and I’m not complaining about having to do what I was paid to do. I have a PhD in Pharmacology and long experience in modeling dose-response relationships, so I don’t feel unqualified. However, my primary area of expertise is neurotoxicology rather than cancer biology, and I am far more familiar with the literature on lead and methylmercury than arsenic. So, it would have been nice to have other experts involved, especially if the stakes are raised from just setting guidance values for apple juice and infant cereal that have relatively little economic impact to suggesting that consumers modify their rice intake. But in order move from the subjective “I” to the intersubjective “We”, I think that those of us who are afflicted with model uncertainty need to be permitted to write in the same way as we converse among ourselves, and we can’t do that if we are forced to pretend to objectivity that we really don’t have.

References

Carrington CD (unpublished). Plausible In, Plausible Out: A Bootstrap Methodology to Characterize the Uncertainty Associated With Dose-Response Modeling.

Carrington CD, Murray C, and Tao, S. (2013). A Quantitative Assessment of Inorganic Arsenic in Apple Juice.

Hill, Sir Arthur Bradford (1965). The Environment and Disease: Association or Causation? Proc Royal Soc Med 58:295-300.

U.S. Food and Drug Administration (2016). Arsenic in Rice and Rice Products Risk Assessment.

Official Post Soundtrack

Green Day (1997). Reject. In: Nimrod, Track 14.

Post Notes

Thesis Post #64. Even though I will probably group it in the arsenic series, this post is really about academic politics.