Saturday, December 19, 2015

Why Should I?

Giving Advice

Governments, parents, and other authorities often give the subjects under their care advice about what to do, ostensibly for the betterment of their own welfare and that of others around them. Two examples from the realm of traffic safety will serve to illustrate this phenomenon. First, pedestrians are often told to look both ways before crossing the street. Second, speed limits are usually in place on any given road that prohibit the speed at which vehicles may be driven. The obvious difference between these two examples is that while compliance with the directive to pedestrians is voluntary (at least on the part of the government), the second is compulsory. If you go above the speed limit, then the government is legally entitled to collect money from you, and perhaps take your license way. There are no such consequences to pedestrians who only look one way instead of two.

Similar instances of government advice are to be found in food safety. On the compulsory side of things, the government may legally limit which chemicals are added to food, and how much can be added when they are approved for use at all. Food manufacturers who fail to comply with that government advice can have their products found to be “adulterated” and seized. On the other hand, the government has far less authority to regulate chemicals that are aren’t deliberately added. The practical reason for that is that it often isn’t possible to separate the chemical from the food. As the main topic for this discussion, all fish contain methylmercury in varying amounts, which is known to be neurotoxic. But, fish can also be an important source of many nutrients. Consequently, many state public health departments and the federal government have issued advisories that direct expectant mothers to restrict how much and which fish they consume. Compliance on the part of consumers is entirely voluntary.

Considering the Science

Generally speaking, authorities don’t tell people what to do without a reason. For example, transportation departments keep statistics on accident rates, so they have a pretty good idea about how the frequency and severity of the accidents that occur will be with different speed limits. They also have higher speed limits on roads used for commuting and intercity travel, and therefore set lower limits on streets that are likely to have pedestrians. That doesn’t mean, of course, that everyone agrees with it. People may disagree with the facts, or the decision given the facts. But, if the advice is legally enforceable, it doesn’t really matter whether you agree or not; the penalty for failing to comply or not it the same. On the other hand, if the advice is just a suggestion or guideline then you are free to disagree as you like. For example, if you think looking in just one direction on a one-way street is sufficient, then you may do so.

On the topic at hand, a group of 30 senators recently addressed a letter to the Food and Drug administration concerning the advice to be given to pregnant women regarding fish consumption. The concluding paragraph is as follows:

One of the FDA’s core responsibilities is ensuring that consumers have access to accurate, actionable information about the agency’s scientific findings. Prior to issuing final advice, we strongly encourage you to consider the science underpinning the advice, and also the manner in which information is relayed to the consumer. While we are eager for the advice to be finalized, it is critical that the final advice reflect the latest science and be presented to consumers clearly so they can make the best possible decisions about the nutritional value of seafood during pregnancy.

It seems that the senators are giving the FDA advice about what advice to give. This advice may not be compulsory, but it is “strongly encouraged”. So, what is it the FDA is being advised to do? The most prominent directive is that this group of senators expects the agency to put consumers in a position to make their own decision. That will obviously require the agency to in addition to (or perhaps instead of) telling pregnant women how much and what fish to eat, consumers should be informed of what is likely to happen if they eat less or more than the suggested amount. This is especially important since the advice is voluntary; some women may prefer to eat less fish while others may prefer to eat more.

The other directive is that the FDA should “consider the science”. This may sound trite, but it really isn’t. For one thing, it means that in addition to giving the reasons underlying the advice, the agency can’t just make those reasons up. It also means that the agency needs to take a public position on what the risks and benefits of eating fish are, and be willing to defend those assertions before the scientific community. For example, the risk benefit analysis mentioned in the letter qualifies in that regard (disclaimer: I was a primary author of that report). If the agency is currently unwilling to stake its reputation on that report (disclaimer: I don’t work there anymore) then they should be able to present the alternative that they are currently willing to defend.

Because We Say So

Regardless of what scientific position the agency takes, it is clear that there also value judgments that go into deciding how much fish to eat. For example, the risk-benefit analysis on the FDA web site describes neurobehavioral risks and benefits associated with the consumption of various species of fish, and estimates an optimal amount of fish consumption for each. But, translating those results into how much fish should be consumed isn’t exactly straightforward. There are judgments to be made about how important it is to be optimal. Does 1 IQ point really matter? How about one tenth of an IQ point? How will the many uncertainties associated with estimated effects that are largely too small to measure accurately be resolved?

The agency can, of course, place a value on small uncertain changes in IQ and other measures of behavioral performance themselves. In fact, making value judgments on behalf of the public is what regulatory agencies are generally in the habit of doing. Furthermore, that is exactly what many consumers want. But, for fish consumption advice, that tactic almost certainly isn’t going to suit everyone; especially those people who are otherwise disposed to eat more or less fish than the prescribed amount. In particular, wrapping all the science and value judgments into a single arbitrary number (e.g. the EPA Reference Dose) that defines “safety” without conveying any information about what anticipated health consequences are won’t put consumers, or anyone else for that matter, in a position to make their own decision.

References

Seafood Advice Letter to the FDA, Dec. 17, 2015

USFDA and USEPA (2014). Fish: What Pregnant Women and Parents Should Know. Draft Updated Advice by FDA and EPA / June 2014.

USFDA (2014). Quantitative Assessment of the Net Effects on Fetal Neurodevelopment from Eating Commercial Fish (As Measured by IQ and also by Early Age Verbal Development in Children).

Official Post Soundtrack

Smiths, The (1984). What Difference Does It Make? In: The Smiths, Track 8

Post Notes

Thesis Post #52. Part of the fish advice saga, and the first that is based on current events. Unlike the the more historical posts on the same topic, this one is basically an editorial.

Saturday, November 28, 2015

Plausible In Plausible Out

Computers are very logical and very obedient. They will calculate the conclusion for whatever premises and assumptions they are given. The one shortcoming, of course, is that the conclusions are never any more reliable than the premises. Although using a computer to provide risk estimates often does hide shaky premises inside the proverbial black box, that doesn’t have to happen. But, opening up the black box requires some effort on the part of both the programmers and the users. In particular, there are three main things that need to happen:

Defining the Right Answer

A risk assessment answers questions, and a good risk assessment will answer those questions well. But, what does “well” mean here?

True. The obvious response to that question is the “true” answer. But, if the scientifically true answer is, in fact, not known then that definition of “good” just isn’t practical. A different answer is therefore often substituted: Garbage In, Garbage Out.
Politically Correct. If the true answer is not to be had, then risk assessment can be viewed as a set of standardized procedures and/or expectations. Even though the answer isn’t really true, it is relatively simple to obtain, and that may on some occasions be sufficient to justify the use of default assumptions and the like. Furthermore, as long as “politically correct” is not conflated with “true” then there is little or no harm done. But, it seems to be inevitable that just that very thing happens: Acting upon a particular assumption leads people to believe it. Gospel In, Gospel Out. Reality then apparently become a product of political negotiation. But people aren’t entitled to their own facts, not even in Washington.
Honest. Which leads to a third definition of “good”, which is “honest”. While that sounds trite, it really isn’t. “Honest” in this case doesn’t necessarily entail admitting to a moral failure. Instead, it involves fessing up to uncertainty: An estimate with a narrower confidence interval isn’t really better unless that confidence is justified. In fact, if moral judgment is an issue at all, it involves giving up the PC objective, where “good” really means “good enough”. Perhaps the main difficulty with the honest uncertain answer is technical. It is easy to say “I don’t know”, but that doesn’t answer the question and it may not be entirely honest either when something, but not everything, is known about the problem. Therefore, the goal becomes to provide all plausible answers, rather than the single true answer. Plausible In, Plausible Out can get complicated.

The Other Probability

In the realm of words, different species of probability have been recognized and discussed for centuries. Hume (1739) differentiated between the “probability of chance” from the “probability of causes”. Even though Cohen (1977) called the other probability “nonpascalian” since Blaise Pascal is often credited with developing the mathematics associated with chance and statistics, Hacking (1975) credited Pascal with devising the probability tree, using it to give the theoretical probability for the existence of a Catholic god the same “epistemic standing” as an aleatory probability. Yet, when risks become matters of degree, the other probability often disappears, and probability seemingly becomes synonymous with statistics. The probability that inevitably gets left out when this happens is the theoretical sort.

However, there are many analysts who do find it useful to distinguish concepts of probability. For, example Kaplan (1997) gave a keynote address at a Society of Risk Analysis meeting that gave outlined three concepts of probability. One can be uncertain about variability, and one can entertain different theories that may account for a statistical reality. Furthermore, the importance of model uncertainty in making many risk estimates is widely recognized.

The simpler part of solving the model uncertainty problem of is to employ a different formal representation, namely the probability tree, for a theoretical probability than for the continuous probability distribution that is typically used for statistical uncertainties. But, the other thing that needs to happen is that the responsibility map for the division of professional labor needs to be redrawn. When uncertainties arise, it is generally thought that the solution is to send in a statistician. In fact, many statisticians do indeed themselves consider the Other Probability to be their responsibility, and as a result, have devised various and sundry Bayesian schemes intended to stuff the recalcitrant prior probabilities into an aleatory mold. But, that’s not really what is needed. Probability trees are the domain of multi-handed scientists (David, 1975), so if you want to assign a probability to a theory, ask them what to do. Even if they can’t identify the correct theory, they should be able to say which are more likely and why: Maybe that’s a good way to find out who the better scientists really are.

Examining Assumptions

When assumptions are justified by tradition or regulatory policy, then the process of providing estimates do not provide an occasion for questioning the validity of the premises. Not very scientific: Policy In, Policy Out. Maybe the PC answer is within the realm of plausible interpretation, but who would know? On the other hand, if an estimate is purported to be valid, then the premises are bound to be questioned, especially when there is health or money at stake. That will call for a supporting argument.

If one premise can be proven to the exclusion of all others, then Truth In Truth Out. In some cases, a sensitivity analysis may demonstrate that the alternative assumptions yield about the same answer, thereby making the issue moot. But, with low-dose risk estimation, that doesn’t happen; the choice of model used to make the risk estimate matters a lot (NRC, 1994). So, you have a probability tree. Yet, which alternative theories make up the tree still matters, and scientific arguments still need to made for either including them or not. An uncertainty analysis may make it easier to come up with a risk assessment that is scientifically credible, but that can only happen if the set of alternative assumptions can be defended as being at least plausible, and the full set encompasses the entire range of assumptions that are distinctly possible: Probable In Probable Out.

Bayesian solutions to model uncertainty often allow probabilistic judgments regarding theories. However, the general inclination is to relegate such judgements as “priors” that are subsequently modified based on the available data. There are a couple of problems with that. First, judging theories does not happen in a data vacuum; data has a lot to do with it. Second, and perhaps more importantly, new data may to revision of the scientific judgments of alternative theories, or maybe even result in the introduction of a new theory entirely. If that happens, Bayes theorem will have nothing to do with it.

References

Cohen, L.J. (1977). The Probable and the Provable. Oxford: Clarendon Press.

David, EE (1975). OneArmed Scientists? Science 176: 679

Hacking, I (1975). The Great Decision. In: The Emergence of Probability. Cambridge University Press.

Hume. D. (1739). A Treatise of Human Nature, EC Mossner (ed.), London: Penguin Books.

Kaplan, S (1997). The Words of Risk Analysis. Risk Anal 17:407-411.

Official Post Soundtrack

Clan of Xymox (1989). Imagination. In: Twist of Shadows, Track 8.

Post Notes

Thesis Post #51, Part of the solutions thread.

Thursday, November 26, 2015

Dietary Supplements

No Approval Necessary

From a legal standpoint, there is a third class of food chemicals in the United States that are neither food additives nor contaminants. Like food additives, these chemicals have an intentional use; when consumers buy them, they expect or hope that it will have some desirable consequence. Yet, unlike food additives, they do not require FDA approval before they can be sold. This is pretty much true by definition: If a manufacturer wants to get FDA approval, they can. But if they decide that is too difficult or not possible, they can call it a Dietary Supplement and sell it anyway.

There are some restrictions. If a manufacturers are going to sell a new supplement, they are supposed to notify the FDA and send in whatever information they have about the supplement. The FDA may decide to object to the sale, or not. In particular, dietary supplements are subject to the same legal standard as unintentional contaminants, so if the product is hazardous enough to meet the “may render injurious” standard, then the FDA can prevent sale of the product.

Yet outside the commonality of “no FDA Approval”, the range of products that are sold as dietary supplements is tremendous, as are the reasons for NOT seeking FDA approval. Some may be sold as supplements because the process for getting approval is too expensive for small companies, which is probably the most legitimate reason. Some may be sold as supplements because they couldn’t meet the rather strict safety standards required for food additives. The supplement route can also be used to avoid the approval process required for drugs. And finally, products that really have no use at all can rather easily be sold as dietary supplements.

Nutritional Supplements

If a product is sold as a dietary supplement, nutrition is what most people have in mind. There are nutrients like calcium, iron, and vitamins that some people don’t get enough of in their diet, and therefore they need supplements to meet their nutritional requirements. So, supplement manufacturers put nutrients in pills, and sell then to consumers that may need them. Furthermore, the amount of a particular nutrient in a supplement generally follows recommendations made by the Federal government. Since nutrients can also be toxic, dietary guidelines consider both how the amount of the nutrient is that is necessary and how much may be excessive (Institute of Medicine, 2013). That all sounds good, and it is.

However, that doesn’t mean supplement purveyors won’t encourage consumers to buy supplements they don’t really need. Free enterprise, yay. For example, you can buy a manganese supplement. Yes, manganese is an essential nutrient – that has been demonstrated in animal studies with controlled artificial diets. However, there is no evidence that anyone has ever suffered from manganese deficiency. On the other hand, there is evidence from human studies that indicate that manganese is neurotoxic with symptoms that closely resemble Parkinson’s disease.

Drugs

Many of the substances sold as dietary supplements are really drugs. While nutrients are essential to normal structure and function, drugs are not. Before the Dietary Supplement Health and Education Act of 1994, selling a drug as a supplement was illegal, but now it is not. Like the drugs that are regulated by the FDA as drugs, the drugs sold as dietary supplements have a wide range of effects. Many of them can be used to treat various diseases. There are also many psychoactive substances sold as supplements that alter mood or behavioral performance. The best known example is caffeine, which is also used as a food additive.

Perhaps the best example of the drug subcategory of dietary supplements is ephedra, a plant that contains ephedrine. Ephedrine is a sympathomimetic compound that has many of the same pharmacological properties as adrenaline and other substances that occur naturally in the human body. It is also very similar to many of the drugs (e.g. pseudoephedrine or Sudafed) found in over-the-counter cold medicines that can be obtained in the U.S. and elsewhere without a prescription. In China, the herb is known as Ma Huang, which like many other products sold as supplements, has a long history of use in Chinese medicine. It can indeed be a useful treatment for colds. Ephedra has also been marketed as a stimulant (i.e. in “energy” supplements) and for weight loss. It works for that too. But, like many other drugs, it can have side effects, and an overdose can be lethal. In fact, many people have died from ephedra overdoses. As a result, and because the FDA can regulate supplements when there is demonstrable harm, ephedra has been banned. However, you can still grow ephedra yourself, or order synthetic ephedrine from Canada and have it delivered to the United States.

The drugs sold as dietary supplements may or may not be better than drugs approved by the FDA as drugs. Leaving that issue aside, there is another problem: Products that are sold as natural remedies in other countries are sold as dietary supplements in the United States, which one might expect to lead some people to think of those products as foods, rather than medicine. Since people don’t expect to OD on food, they are more likely to do so with a dietary supplement like ephedra. That’s a problem.

Placebos

The third category of dietary supplements are comprised of substances that pretty much don’t do a damn thing. You can put dirt in a capsule and sell it as a dietary supplement, and guess what, people do. Giving people the hope of a cure can, in fact, be a cure; so maybe they are worth something. A good exemplar here is Laetrile, otherwise known as amygdalin, which is found in bitter almonds, apricot pits, and cassava. It is called a cyanogenic glycoside because it releases cyanide when ingested. Even though there is no scientific evidence to support claims that laetrile or amygdalin can treat cancer or any other illness, it has been marketed and used as an anticancer agent. It was especially prominent in the US during the 1970’s, when it came to called “Vitamin B17” even though it isn’t a vitamin at all. Although the evidence of harm at recommended levels of use isn’t especially strong, the cyanogenic properties of amygdalin led to an FDA ban in 1971.

The biggest downside of placebos is that they may be used instead of a treatment that would work. Nonetheless, the main thing the FDA does with regard to placebos is to regulate health claims; a supplement manufacturer cannot claim on the label or packaging that a supplement will produce a desirable health effect without getting FDA approval first. However, supplement manufacturers can put whatever claim they want on the internet, and they do. For example, they will tell you that Laetrile works: Google it for yourself.

Reference

Institute of Medicine (2013). Dietary Reference Intakes Tables and Application. National Academy of Science, Washington DC.

Official Post Soundtrack

Pere Ubu (1977). Final Solution. In: Terminal Tower, Track 3.

Post Note

Thesis Post #50. Part of the Regulatory Toxicology series.

Thursday, September 17, 2015

Monologues and Dialogues

Iteration

To iterate or not to iterate, that is the question. Whether tis nobler to damn the torpedoes or to consider other options and opinions. Like personal decisions, regulatory policy decisions are always plagued by this dilemma to some degree: The risk assessment is never really done until the decision has been made. Formal decision processes that are molded by institutionalized strictures are, of course, usually not flexible enough to go either way. Therefore, they either advocate staying or going. Since the objective is usually to make policy rather than not, usually it’s the latter. For example, consider the 1983 NRC risk assessment paradigm. It’s a linear process that starts with research that progresses to risk assessment which then gives way to risk management:

That’s the monologue version of public decision making. The scientists give the risk assessors the data, the risk assessors analyze the data and give the risk managers the risk estimates, and the risk managers decide what to do. That characterization sounds so good, it gets repeated quite often. But, for better or for worse, that’s often nothing like what really happens. For example, a different NRC committee depicted the decision process like this:

Yeah, it’s a mess – but it really is a better characterization of what actually happens sometimes. In any case, the main feature of the 1996 version that is worth noting is that there are “feedback loops” that make the process somewhat circular instead of linear. Or in other words, instead of a monologue, the risk assessment process is comprised of one or more dialogues. In order to simplify the formal representation a little bit, here is a more basic adaptation of the 1983 paradigm with dialogue included:

With this simpler representation, a couple of basic points can be made about risk assessment when it is viewed as an iterative dialogue rather than a linear monologue:

The Risk Assessment process does NOT begin with scientific research. In fact, since every formal risk assessment is preceded by a subjective one, it may not be possible to assign a starting point. At best, formal risk assessments serve to refine decisions rather than to create them from nothing. At worst, they sell bad decisions.
The actual risk assessment part of the process sits at the juncture of two very different sorts of dialogues. The policy dialogue is concerned with what actions are to be taken as a matter of public policy. The scientific dialogue is concerned with the development and establishment of scientific dialog. At best, risk assessment serves as a science-policy interface. At worst, risk assessment is where scientific illiteracy meets political naiveté and everything comes to a grinding halt.

Scientific Role Playing

While there certainly are some minor differences between personal and public decision making, they are both comprised of something like the policy loop and the science loop. For example, a philosopher would call the science loop “epistemology” and the policy loop “ethics”. Yet the 1983 monologue version of the decision paradigm presupposes that Risk Assessment starts with unbiased ethics-free scientific research. But, unless it is totally useless, science is never conducted without a purpose. Well, that’s what taxpayers are told anyway.

More to the point, every research program has an ethical underpinning of some sort. Those value judgments may come from the scientists themselves or they may come from whomever is paying for the research. The monologue ignores all that. If research is designed to determine an exposure with an acceptable risk, then the ethical issues must be resolved before hand. Who does that? Someone who wears a lab coat probably had a lot to do with it. Scientific research often comes with a political viewpoint attached.

While scientists are not devoid of ethical judgment, with some effort science can be extracted from the scientist – it just takes a little cross-examination, scientific dialog, or peer review. In fact, that is what good scientists do; they argue and they criticize one another incessantly. A good scientific dialogue will expose personal bias. But, as a formal representation of the decision process, the monologue doesn’t recognize that. Therefore, the monologue engenders and empowers advocates in lab coats who are ready to play the science card at the drop of a hat. They may call for peer review, but only when a majority of the peers have the right moral principles. But, that’s not good science. It’s probably not good policy either.

The Political Narrative

The fact of the matter is that scientists don’t really need formal decision paradigms, but politicians do. For better or for worse, one of the tasks associated with running a government is convincing the citizens that the government is doing what it should be doing. As a result, decision paradigms often function as political rhetoric. The rhetoric is often not the least bit subtle. For example, appendix N of the 1994 NAS treatment of risk assessment pits a monologue version of risk assessment that enforces the use of conservative assumptions vs a dialogue version that embraces the acknowledgement of scientific uncertainties. If that isn’t politics, what is?

The monologue version of risk assessment is essentially autocratic. It embraces the creation and use of a process that can be directed and controlled. As a corollary, it tends to create an Us-vs-Them mentality: Either you support the process or you don’t. For example, if the government is going to ensure the safety of the food supply, then the government needs to be in control of the risk assessment process. right? Perhaps not.

The dialogue version of risk assessment is the democratic ideal: Free speech, public participation, two heads are better than one, yada yada. But, as with the 1996 figure above, it’s a mess; hardly a process at all. Then there is the iteration problem: Once you start talking, when do you stop? In its purest form, the dialogue basically always results in paralysis by analysis. As the 1996 NAS report noted, the big trick with dialogues is coming to closure. Deadlines can help a lot in that regard.

References

National Research Council (1983). Risk Assessment in the Federal Government: Managing the Process. National Academy Press, Washington, DC.

National Research Council (1994). Science and Judgment in Risk Assessment. National Academy Press, Washington, DC.

National Research Council (1996). Understanding Risk: Informing Decisions in a Democratic Society. National Academy Press, Washington, DC.

Official Post Soundtrack

Depeche Mode (1990). Policy of Truth. In: Violator, Track 7.

Post Notes

Thesis Post #49. A sequel to "The Risk Assessment Paradigm", but also related to the last post on objectivity.

Friday, August 21, 2015

Risky Objectivity

A Dictionary of Objectivity

To be objective, one must not be subjective – that much is clear. What is objective is visible while what is subjective is hidden. But, what is hidden? It’s often hard to say, really. After all, it’s subjective. However, with the aid of a little psychoanalysis, at least two major notions of “not subjective” may be distinguished:

Empirical Objectivity. The essence of empirical objectivity is the recorded observation. Ostensibly, a scientific datum conveys the state of the world, or at least a small part of it, to any and all interested parties: It’s what is out there where anybody can see it, if they care to look. There are more than a few caveats though. Data can be falsified, misrepresented or misunderstood. For example, in computer engineering data is anything that can be stored in in a computer’s memory - whether it is true, false, or meaningless. Since scientific data are often kept in computers, it is easy to conflate one with the other.

Formal Objectivity. Unlike empirical objectivity, the subject of formal objectivity is an action or a decision process. It is therefore more of a legal concept than a scientific one; ‘transparency’ is the common synonym for formal objectivity. However, because the ostensible goal of formal objectivity is to limit subjective judgment, formal objectivity shares with empirical objectivity the goal of attaining impersonality. If an entirely mechanistic process is used to determine what actions are to be then personal judgment can be banished altogether. The regimented application of safety factors to determine a level of exposure that will be considered safe is one example of formal objectivity. That is not necessarily a good idea, but it is possible. Still, allowing the opportunity for criticism can be counted as a benefit of formal objectivity.

Mathematical Models and the Problem of Causation

A mathematical or conceptual model that is used to describe what has been observed and to make estimates to guide future endeavors can never quite be empirical in quite the same sense that data is. But, it can come close. If a mathematical model closely mimics a large set of observations, then calling it an empirical model is an apt description. Furthermore, a mathematical model is always objective in a formal sense: If an estimate is produced by a mathematical model, then you can see where it came from. Therefore, a model can be objective in both an empirical and formal sense. However, an objective model isn’t necessarily entirely correct. A flat map can be very useful, but the world is round.

If one accepts Hume’s account of causation (Hume, 1739), then models that represent causal (e.g. dose-response) relationships can never really be considered to be empirical. Yes, there may be an empirical association, but making a model of an empirically objective association objective in a formal sense may be worthy of derision. In fact, in epidemiology and evidence-based medicine, a causal model that is justified solely by “observational” data is considered to be weaker than one justified by experimental trials (Santos SIlva, 1999) or also includes some theoretical justification such as the Hill criteria (Hill, 1966).

A causal model can still be objective in a formal sense. In fact, the quest for objectivity may lead to formalizing the processes for the creation of mathematical models for data. For example, for the creation of a cancer risk assessment model, using a maximum-likelihood estimate and a linear model is often considered de rigueur. But the desire for formal objectivity may run counter to a need for theoretical justification where a formal process is difficult to impossible to come by.

The Intersection of Objectivity and Probability

So, we have two different notions of objectivity and three to five different notions of probability. Is probability objective or subjective? It depends; that is the short answer. As a longer answer, each notion of probability must be taken one by one:

Frequency. Although variability might not be truly considered to be probability per se, stone cold statistical facts can be objective in an empirical sense. For example, a tabular representation of historical population data may be considered to be empirical. However, frequencies can also be theoretical and represented by a mathematical model of a statistical distribution, which can lead to an invocation of formal objectivity. In particular, there must be generally be a conceptual model that posits that the conditions of the estimate belong to the same class of events as the historical frequencies that the model is based on.
Abstract Probability. This one is easy. Abstract probability is never objective in an empirical sense, but it is always objective in a formal sense.
The Probability of Chance. Since this form of probability arises when an infinite theoretical frequency is used to estimate the chance of a single instance or finite series, it can be just about as objective as the frequency theory that is used to make it.
The Probability of Causes. It takes theory to kill a theory, and yes, one can and should use empirical data to make that argument; different theories may have varying degrees of empirical support. Yet, when competing theories exhibit just about the same degree of concordance with historical data, the notion of empirical objectivity is pretty much superfluous and formal objectivity becomes the only option. For example, the Hill criteria can be used to structure a scientific argument for a causal model, and/or a quantitative measure goodness-of-fit technique may be used to gauge empirical support. But that doesn’t really confer empirical objectivity; at best a formal method for assigning probabilities to theories is intersubjective instead of personally subjective.
Mixed Probability. For many decision problems, the whole enchilada of probability must combine probabilistic notions of both chance and theory. That brings a new layer of potential subjectivity to the front. Will a Bayesian scheme be used to combine “prior” probabilities and empirical observation, or not (e.g. a probability tree)? And who will decide – the quants with statistical training or the scientists with a background that is primarily not quantitative?

References

Hill, Sir Arthur Bradford (1965). The Environment and Disease: Association or Causation? Proc Royal Soc Med 58:295-300.

Hume. D. (1739). ATreatise of Human Nature, EC Mossner (ed.), London: Penguin Books.

dos Santos Silva, I (1999). Cancer epidemiology: principles and methods. IARC, Lyon. Chapter 5: Overview of study designs. International Agency for Research on Cancer, Lyon.

Official Post Soundtrack

Cure, The (1979). Object. In: Three Imaginary Boys, Track 5.

Post Notes

Thesis Post #48. Ideally read after A Dictionary of Probability. Three Imaginary Boys is the UK title of the first album by The Cure. I going with that because I'm posting from Oxford, the title fits the essay better, plus the U.S. CD version titled Boys Don't Cry left "Object" off. Bummer.

Thursday, July 23, 2015

The Meaning of the Mean

The Average as a Surrogate for the Total

In statistics and probability, the arithmetic mean or the average value is often considered to be especially important. There are some good reasons for this – sometimes. Other times, not so much. For starters, there really is no such thing as an average person. So, knowing the average value for a population may not give you very much information about yourself or any other specific individual. But, for the purpose of providing a quantitative description of a population, the average often works rather well. The reason is simple; the average is proportional to the total:

Average = Population Total / number of persons

Therefore, as long as the utility function is also proportional to the quantitative value, the average serves as a utilitarian measure of value. Even though that proposition is dubitable to the point of being obviously wrong under some circumstances (e.g. for risk assessments where the risk is driven by extreme values), the “average person” often serves as a useful stand in for "everyone".

Then, there is the "Expected Value". Mathematical probability was originally devised to calculate the frequency of occurrence of specific results from games of chance played many times. This in turn, allowed the rate of return over a long period (theoretically infinite) of time to be estimated. Once again, the value of interest corresponds to the arithmetic mean. For example, a gambler seeking to profit from a series of bets can evaluate the bet as follows:

Expected Value = Total Net Return / number of bets

The use of mathematical probability in finance and insurance often uses the same underlying logic: Given the fact are sure to be some bad loans and bad insurance risks, the key having a profitable business is to have the average return be positive. At least, that is what investors expect.

Measurement Error

Using the Standard Error of the Mean to characterize the uncertainty associated with scientific measurements has a long history. Writing in 1755, Thomas Simpson adapted the Bernoulli theorem (aka the law of large numbers) to make the following observation (quote from Stigler, 1986):

Upon the whole …. It appears, that the taking of the Mean of a number of observations, greatly diminishes the chances for all the smaller errors, and cuts off almost all possibility of any great ones: which last consideration, alone, seems sufficient to recommend the use of the method, not only to astronomers, but to all others concerned in making experiments of any kind (to which the above reasoning is equally applicable). And the more observations or experiments that are made, the less will the conclusion be liable to err, provided they admit of being repeated under similar circumstances.

However Simpson’s claim was met by immediate criticism from Thomas Bayes, who noted (also in Stigler, 1986):

As I see no mistakes in Mr. Simpson’s calculations, I will venture to say that there is one in the Hypothesis on which he proceeds. And I think it is manifestly this, when we observe with imperfect instruments or organs; he supposes that the chances for the same error in excess or defect are exactly the same, and upon this hypothesis only has he shown the incredible advantage, which he would prove arises from taking the mean of a great many observations.

In other words, the standard error of the mean accurately characterizes the uncertainty of a measurement only when, as Simpson assumed, the true value corresponds to the arithmetic mean. If it doesn’t, then even though the theorem is true, the result is irrelevant. For example, if the errors are lognormally distributed, then the true value will correspond to the geometric mean rather than the arithmetic mean. If the underlying distribution of the measurements to the true value is unknown, then so is the relationship of the true value to the distribution. Calling the mean value the expected value doesn’t help at all.

Averaging the Truth

In the realm of the probability of chance, the mean value is almost certainly given far more credence than it deserves. But still, under most circumstances the arithmetic mean isn’t too far from the actual value of interest to not be considered approximately true. On the other hand, with the probability of causes, or any other notion of probability arising from a notion of competing theoretical propositions, there is no basis for using a mean value at all. For example, consider the probability that the earth is round as opposed to flat. As a decision problem, under no circumstances would it make any sense to average the flat earth theory with the round one. Yet, that is essentially what Bayesian Model Averaging does.

The admirable trait of Bayesian Model Averaging (BMA) is that it acknowledges that different plausible models may yield estimates that may be quite different (Hoeting et al, 1999). Like a probability tree treatment of model uncertainty (e.g. Evans et al, 1994; Carrington et al, 2013), BMA requires identification of a set of alternative plausible models and establishing a model probability that will surely require some degree of subjective judgement. But, with BMA the subjective probability is just the prior probability rather than the finished product. Bayesian updating and averaging is the next step.

The differences between BMA and an unvarnished probability tree are all attributable to different notions of probability. Like Bayesian schemes in general, BMA is intended to give the probability of causes a mathematical treatment that resembles that used for the probability of chance; and the fixation on the arithmetic mean comes with that package. A probability tree approach that embellishes a weight-of-the-evidence evaluation is apt to use something like the Bradford-Hill criteria (Hill, 1966) to establish model probabilities, none of which assign any importance to the arithmetic mean. Given the fact that assuming the mean is what led Thomas Bayes to criticize Simpson, it seems that the real Bayes would never have approved of BMA.

Along with a range or outer bounds, the mean is perhaps a useful central estimate even when uncertainty arises from competing plausible propositions. But, since it corresponds to a common legal standard of proof (“preponderance of the evidence”), the median is better for many purposes. But there may be room for both. The real problem with BMA is that it proffers the arithmetic mean as the value of interest. It isn’t really; the value at stake is the truth. If current science is unable to divulge it, then we really don’t know what to expect.

References

Carrington CD, Murray C, and Tao, S. (2013). A Quantitative Assessment of Inorganic Arsenic in Apple Juice.

Evans, J.S., Graham, J.D., Gray, G.M., and Sielken, R.L., Jr. (1994). A distributional approach to characterizing low-dose cancer risk. Risk Anal 14:25-34.

Hill, Sir Arthur Bradford (1965). The Environment and Disease: Association or Causation? Proc Royal Soc Med 58:295-300.

Hoeting JA, Madigan D, Raftery AE, and Volinsky VT (1999). Bayesian Model Averaging: A Tutorial. Statistical Science 14:382–417.

Stigler SM (1986). Probabilities and the Measurement of Uncertainty. In: The History of Statistics: The Measurement of Uncertainty before 1900. Belknap Press, Cambridge MA, pp. 62-98.

Official Post Soundtrack

Supertramp (1975). The Meaning. In: Crisis, What Crisis?, Track 9.

Post Notes

Thesis Post #47. Best read in conjunction with "A Dictionary of Probability" an "Quantifiers".

Thursday, July 9, 2015

Middle Ground

The “Low Dose” Problem

Evidence of potential harm from arsenic and other contaminants usually comes from epidemiology studies exposures that are much higher than those that occur in the diet. Because these exposures are both statistically significant and there is strong evidence that the association is causally related to exposure to the contaminant, this “high-dose” region is the only part of the curve where the data are good enough to empirically characterize the shape of a dose-response curve. Potential effects at lower doses necessarily involve extrapolation from high doses to the “low-dose” region where exposures from the U.S. diet actually occur. There is also often a substantial “intermediate-dose” part of the curve that is in between the high- and low-dose regions.

The inevitable question underlying most toxicological assessments is this: What effects in the low dose region can be inferred from the demonstrable effects in the high dose region? Since effects in the low-dose-region are not within the limits of detection, by definition, any claim of an effect or lack there of must be theoretical. The scientific debate typical revolves around whether or not the shape of dose response is “linear” or “nonlinear”. If it is “linear”, then it is supposed that the risk at low doses is proportional to the risk at high doses. If it is nonlinear, then it is supposed that the risk at low doses is negligible, and therefore, no quantification of the risk is necessary. But, there are many other plausible alternatives. In particular, the risk at low doses may be linear without being proportion to the effect at high doses. As a result, a risk assessment isn’t just about what happens at high doses and low doses; it is about what happens in the middle as well.

Some Theoretical Alternatives

A comparison of some of the mathematical models used for benchmark dose modeling is illustrative. The behavior of these models when used to describe the relationship between exposure to inorganic arsenic in NE Taiwan (Chen et al, 2010; Carrington et al, 2013) are illustrated in the following three figures that show four different models in three different dose ranges.

The High-Dose Region

The Intermediate-Dose region

The Low-Dose Region

At high doses, all four of the models are nonlinear. Even the Weibull model, which appears to be linear in Figure 1, becomes nonlinear at doses that result in incidence rates that exceed 50%. However, near the transition point between the high and intermediate dose ranges there is a large discrepancy in the models. While the Weibull model is almost completely linear, the Probit model is somewhat nonlinear, while the Logprobit and Quantal Hill models are highly nonlinear. As a result of their nonlinearity in the intermediate dose range, the latter two models are nearly flat at low doses, which is indicative of an incremental risk that is very close to zero . Although the increase is very small relative to background, the other two models exhibit a noticeable slope in the range of dietary exposure.

No Dichotomy

Given the complexity of biological reality, none of these simple models are likely to be entirely correct: They are approximations at best. Nonetheless, they serve to demonstrate that the shape of the curve really does matter. Just about all plausible curves are non linear at some point, yet are still approximately linear at very low doses. Nonlinearity does not imply that there is a threshold. Linearity does not imply that the risk is of any significance. It all depends on how and where the nonlinearities occur, and in the intermediate region theoretical justification is the only game in town.

References

Carrington CD, Murray C, and Tao, S. (2013). A Quantitative Assessment of Inorganic Arsenic in Apple Juice.

Chen CL, Chiou HY, Hsu LI, Hsueh YM, Wu MM, and Chen CJ (2010). Ingested arsenic, characteristics of well water consumption and risk of different histological types of lung cancer in northeastern Taiwan. Environ Res. 110:455-62.

Official Post Soundtrack

Fixx, The (2003). Straight 'Round the Bend. In: Want That Life, Track 7.

Post Notes

Thesis Post #46. First post in almost a month. That mostly because my manfesto is pretty much manifested - I've already covered most of the main ideas I wanted to cover when I set out.