The Subjective Statistician

Weirich, Paul

Consider a coin. Assume that it is fair. Call this hypothesis H. Toss the coin. Use H to calculate the ensuing probabilities. Designate the result a significance test of H.

Assign due credit to Ronald Fisher.

Bayesian statistics is quite different. The axis of inferences goes the other way. It is the probability of H that is up for grabs.

Objective Bayesians ground probability assignments in physical symmetries:

Both sides are the same.

Empirical Bayesians use relative frequencies:

Fifty-fifty is fifty-fifty.

Subjective Bayesians appeal to personal judgments:

I am so into mauve.

Each method is unavoidably subjective, but, like unhappy families, each method is subjective in its own way.

Suppose that E expresses evidence that is new, or newly known. The conditional probability of H given E is P(H|E). The probability of H given E is P´(H). Whereupon P´(H) = P(H|E).

The old probability of H has become the new probability of H.

Thomas Bayes’s theorem offers a means of calculating P(H|E)

P(H|E) = P(H)P(E|H)/P(E).

Notice that

P(H|E)= P(H)P(E|H)/[P(H)P(E|H) + P(~H)P(E|~H)].

Consider the hypothesis H that a coin is fair. Remember that the coin has been tossed twenty times. Remember that on four occasions, it came up heads. Call that heads-up E.

Suppose that P(H) is 0.5.

P(E) is thus [(0.5 × 0.0046) + (0.5 × 0.0911)]

Now look at this:

P(H) = [0.5 × 0.0046]/[(0.5 × 0.0046) + (0.5 × 0.0911)] = 0.0481.

Up the coin has gone, but down has come the probability that the coin is fair.

In Fisher’s method of significance testing there is no appeal to the prior probabilities of H and E. There is just the probability of E given H.

Significance testing is meant to lead to a decision. It has an unavoidably active aspect. Either H should be accepted or not, but, in either case, something must be done.

The imperative of action is to reject H, if, and only if, the test results are improbable given the hypothesis.

In coin tossing, toss-ups are covered by a probability distribution curve. The coin has been tossed twenty times. At one end of the curve, there are four or fewer heads. At the other, four or fewer tails. Such are the critical regions of improbable results. The probability of a test result matching the real result is 0.012. This is less than a significance level of 0.05.

Give it up. The coin is not fair.

Well, neither is life.

But notice this: Fisher’s test shows only that either H is false or that the test results are improbable.

It does not show that H is false. The test is unavoidably disjunctive, and often contingent on other assumptions. In coin tossing, what counts is what comes up, but what also counts is the assumption that what comes down has nothing to do with what went up. Things need not be independent.

That test is now trijunctive.

Both Fisher and Bayes leave something unresolved. A significance test uses likelihoods to decide whether to reject a hypothesis; it leaves open the probability of the hypothesis after the test. Bayesian methods use prior probabilities as well as likelihoods to assign the hypothesis a probability after the test; they do not settle whether the hypothesis’s posterior probability warrants its rejection. Both involve subjective judgments. The prior probabilities of the hypothesis and the evidence, which are essential for Bayesian methods, depend on judgments that are variable from one individual to the next.

Subjective judgments are also in play when setting up a significance test. They are involved in selecting a population, choosing its parameters, establishing a null hypothesis, and determining a critical region—and its significance level.¹

Twenty-first-century statisticians use both significance testing and Bayesian methods.² One method may be more useful than the other; but Bayesian methods and significance testing can be unified, a common foundation found.

Bayesian methods accommodate significance tests more easily than significance tests accommodate Bayesian methods. Significance tests can be reformulated to supply the probabilities that Bayes’s theorem require, but the tests themselves leave no place for Bayes’s theorem. Their framework is as sparse as Nevada in midwinter.

Inference & Decision

Inference and decision are, according to Fisher, separate processes.³ In a significance test, some principles must ground the decision to accept or reject a hypothesis.

But a significance test does not lead directly to a decision. A test lacks forward momentum.

A new fertilizer fails to increase the average yield of corn. Should Farmer Fiona use the stuff? A test by itself offers her no advice. All that it yields, inasmuch as it yields anything at all, is the conclusion that results similar to those obtained are not improbable.

At most, Farmer Fiona should suspend her judgment.

An agent who has rejected a hypothesis, on the other hand, has reached a state of epistemic poise. Having failed to believe the hypothesis, he has made the decision to reject it.

But an agent who steps back from frank rejection has reached another epistemic position. Having rejected H he is not compelled to accept its negation, and vice versa.

If a physicist does not believe in string theory, it does not necessarily follow that he believes that string theory is false.

Trustworthy decision principles use probabilities and the utility of possible outcomes. It is essential to have the probability of the hypothesis in order to calculate the expected utility. Significance tests do not provide this, but Bayesian methods do.

Consider a case where the two methods disagree. An inspection of a coin prior to testing shows that the coin is symmetrical. There is a high pre-test probability that the coin is fair and even taken together with skewed test results—the coin keeps coming up tails—this yields a high post-test probability that the coin is fair.

The statistical reasoning that grounds a decision should use all the evidence bearing on H and, in particular, the evidence summarized by prior probabilities. A post-test Bayesian evaluation of the hypothesis does this when it assigns H a posterior probability.

Significance testing and Bayesian methods have been unified within a probabilistic framework.⁴ They are now, if not the best of friends, then good friends anyway. Suppose that the total evidence for H, one way or another, is E. Within this particular happy marriage (Fisher and Bayes), the probability of H is a rational degree of belief in H given E. A likelihood is the pre-test, rational degree of belief in E, given H.

In what follows I assume a successful unification of significance testing with Bayesian methods, presume that the unification adopts a Bayesian framework, and therefore take Fisher/Bayesian statistics, two heads on one body, to be Bayesian statistics after the absorption of significance tests—one head, one body.

The other head? Don’t even ask.⁵

On the One Hand

No matter the unification, prior probabilities are and remain subjective. Two rational agents considering the same evidence may assign different probabilities to H.⁶

Bayesian conditionalization assumes that evidence is composed of propositions of which the agent is certain. But information may not be certain. It was this that persuaded Richard Jeffrey to formulate a generalized form of conditionalization.⁷ Suppose that an agent undergoes an experience that changes the probability of a proposition B but does not change the probability of any proposition conditional on B or its negation. And suppose that the change in B is the origin of all other changes in probability engendered by that experience.

If A is any proposition, and P is the agent’s probability function before the experience, and P′ his probability function afterwards, then

P′(A) = P(A|B)P′(B) + P(A|~B)P′(~B).

This is still to assume constant conditional probabilities, and so the scheme is not completely general.

The identification of principles that constrain probability assignments—that is another problem. Principles of direct inference use evidence about physical probabilities to impose constraints on personal probability assignments. Thus David Lewis:

Let C be any reasonable initial credence function. Let t be any time. Let x be any real number in the unit interval. Let X be the proposition that the chance, at time t, of A’s holding equals x. Let E be any proposition compatible with X that is admissible at time t. Then C(A/XE) = x.⁸

Here XE stands for the conjunction of X and E, and as Lewis explains:

Admissible propositions are the sort of information whose impact on credence about outcomes comes entirely by way of credence about the chances of those outcomes.⁹

Alles klar?

Typically, propositions providing historical information are admissible. Such principles of direct inference remain controversial, because they risk coming into conflict with standard Bayesian conditionalization.

On the Other Hand

Bayesian literature contains several responses to the problem of subjectivity. Leonard Savage showed that Bayesian reasoners with different prior probabilities, given the same new data, must converge on the same posterior probabilities. As more data arrives, initial probability assignments are less and less important. Savage introduced his theorem, taking S to be the set of possible states of the world, in this way:

[S]uppose that a person is about to observe a large number of random variables, all of which are independent given B_i for each i, where the B_i are a partition of S. It is to be expected intuitively, and will soon be shown, that under general conditions the person is very sure that after making the observation he will attach a probability of nearly 1 to whichever element of the partition actually obtains.¹⁰

The proof invokes Bayes’s theorem and the weak law of large numbers. Savage is optimistic about its implications. “With the observation of an abundance of relevant data, the person is almost certain to become highly convinced of the truth.”¹¹

Really?

Jeffrey, another great optimist, uses Bayes’s theorem to show how observations bring agents who diverge in probability judgments closer to agreement.¹² Suppose that, for a particular coin, one agent assigns 80% as the probability of heads and another, 20% as the probability of heads. After observing 20 tosses with 10 heads and 10 tails, each agent applying Bayes’s theorem to update his probability assignment moves closer to 50%, and thus closer to the other agent.

In both cases, convergence diminishes the effect of subjectivity, but only on the assumption that incoming evidence concerns only certain random variables. Suppose an agent assigns 0 or 1 to H. Before it makes its welcome appearance, convergence may require an unrealistically large amount of data.

Finally, subjective Bayesianism allows an agent at any time to abandon conditionalization and to make a fresh start with probability assignments.

The scheme works only if agents never exercise their prerogative to make a fresh start.

The objective Bayesian believes that the principle of maximum entropy serves to settle probabilities.¹³ Because entropy is a measure of uncertainty, the principle suggests as pertinent a probability distribution containing the least information. Without any relevant evidence, it yields a uniform distribution.

Suppose that a die is rigged so that, when rolled, neither a 1 nor a 6 can come up. A probability distribution, when constrained by these facts, would assign a positive probability only to 2, 3, 4, and 5. The distribution that maximizes entropy assigns a probability of ¼ to each of these four possible outcomes.

So far, so good. But even this method is not entirely objective. Two rational agents may apply the principle differently. In probability theory, the space of possibilities forms an algebra. Of these, there are many. One space may lead to two choices, or to many more. But maximum entropy inferences are choice sensitive because the resulting distribution is not invariant over the choice of measure spaces.¹⁴

The principle of maximum entropy reduces, but does not resolve, the problem of subjectivity.

What about Colin Howson and Peter Urbach?¹⁵ They remain as unruffled as they are unworried. Subjectivity? Not at all. Bayesianism conditionalization is an objectively correct form of reasoning, even if its prior probabilities are not objective. No one, after all, scruples at deductive inferences from thoroughly insane premises.

No doubt this is true, but what is at issue is not the validity of an inference but its rationality, and however elegantly a lunatic may derive the conclusion that he is Napoleon, there remains the fact that he is nuts.

Statistical reasoning requires objective starting points.

Suppose that a medical researcher assigns a high probability to a new drug’s efficacy, given evidence that in pre-clinical trials 60% of recipients recovered from the heartbreak of psoriasis. During clinical trials 60% of recipients do recover. Their skin glows. According to conditionalization, researchers should assign a high probability to the drug’s efficacy. But if the pre-trial conditional probability is subjectively determined, then the conclusion, too, is subjective; a point that federal prosecutors are sure to stress in pre-trial depositions.

It remains to be argued that subjectivity is ineliminable in statistical reasoning, and, look, let’s be honest, what can anyone do about it? David Hume is often assigned the credit, or the blame, for this view. Even the classical statistician cannot eliminate subjectivity from statistical reasoning; she needs a subjective judgment to design a test of H. But to be convincing, a would-be Humean must identify the ineliminable subjective elements of statistical inference.

This she has not done.

The Sources of Subjectivity

Probability assignments may be subjective because the evidence that generates them is subjective, a matter of who is seeing what and when. The subjectivity of evidence is inevitable.

Nevertheless, evidence can attain a type of objectivity. A person possessing sharable evidence may communicate it to others, as when Chicken Little infers—incorrectly, to be sure—that the sky is falling (H) because it is getting dark (E).

The type of evidence that the Bayesian statistician uses is propositional: that, for example, the red litmus paper turned blue when immersed in liquid. The evidence is not the litmus paper. The litmus paper remains what it always was, and that is a red blob. The evidence is the proposition.

A person has access to his own sensations and such access is private. You can no more share my pain than I can share your shadow. My pain constitutes evidence; I am certain of the proposition that expresses it. For others, who know the astonishing proportion of whingers in ordinary life, the proposition that I am in pain is not evidence, nor is it a source of certainty. Introspection is relative and so subjective.

Consider a worker deciding whether to stop for lunch. The worker knows that he is hungry. His evidence is subjective but it rationally grounds his decision. Bayesian theories, in applications outside science, need not forgo subjective evidence. What they need, given E, are objective probability assignments.¹⁶

Suppose that an agent knows that the objective probability of heads on a toss of a particular coin is ½ and knows nothing else. Then H = ½ should be his personal probability assignment. If he knows the objective probability of an event and nothing else, then his personal probability and the event’s objective probability should coincide.

In contrast, suppose that an agent’s evidence is sparse and does not support the assignment of a precise personal probability to H—or to anything else. H may be about the weather, and E may be no more than a look at the sky. Bayesian statistics can use a set of probability functions to represent the import of sparse evidence.¹⁷ E might generate a range of 0.2 to 0.6 for the probability of H.

Henry Kyburg accommodates sparse evidence by letting probabilities be interval-valued and by using available evidence about relative frequencies to settle the probability interval for a proposition.¹⁸ But a set of probability functions that represents an agent’s appraisal of the evidence may yield probability values that do not form an interval. Evidence that is not about relative frequencies, such as evidence of a coin’s symmetry, may affect the set of probability functions as well.

Whereupon there is the inevitably sunny Jeffrey:

I think that we seldom have judgmental probabilities in mind for the propositions that interest us. I take it that what we do have are attitudes that can be characterized by conditions on probabilities, or (what comes to the same thing) by the sets of probability assignments that satisfy those conditions, where the members of those sets are precise and complete: each assigns exact values to all propositions expressible in some language.¹⁹

If it is good enough for Jeffrey, it is good enough for me. I assume that a person’s response to evidence is generally a set of probability functions.

When evidence is sparse and generates a large set of probability functions, this view avoids the criticism that each probability function in the set is subjective. It does not advance any function in the set but rather the set as a whole.

If a person’s meteorological evidence objectively settles a range of probability assignments from 0.2 to 0.6, the choice of 0.2 or 0.6 remains subjective.

But the set of probabilities ranging from 0.2 to 0.6 remains objective.

Because Bayesian statistics must furnish grounds for decisions, its introduction of imprecise probabilities requires decision principles that use them. A set of probability functions provides such a principle.

And Irving Good has formulated just such a principle:

I assert that in most practical applications we regard p as bounded by inequalities something like .1 < p < .8 …. [I]t would be reasonable to use the Bayes solution [a type of expected utility maximization] corresponding to a value of p selected arbitrarily within its allowable range.²⁰

Good’s principle counts as rational any option that maximizes expected utility.²¹

Belief & Degrees of Belief

An attitude toward a proposition is called doxastic if it belongs to the family of attitudes containing belief. This family includes belief, disbelief, suspension of judgment, snorts of derision, and the attitudes that degrees of belief represent.

Suppose that, in response to the evidence at hand, an agent assigns to the proposition that it will rain this afternoon a probability interval of 0.4 to 0.6. This interval is an idealized representation of his response to his evidence. If his doxastic attitude toward rain is precisely represented by the interval, a rational choice, considering that he must carry a cat to work, would be to carry an umbrella as well.

This conclusion, it must be admitted, does not seem an especially inspiring example of theory in action.

One might ask which objective principles move an agent from a body of evidence to the doxastic response represented by a set of probability functions? Specifying them would require a substantial research program. But for Bayesian theory, this is not necessary. All that must be done is to acknowledge the objectivity of a doxastic response.

The problem of subjectivity makes Bayesianism objective about some matters and subjective about others. It maintains that a body of evidence yields—objectively—a set of probability functions for the propositions in a probability model. The evidence does not necessarily provide a unique probability function. A decision reached using the set of probability functions is a subjective matter. It’s private.

Suppose that given the available evidence, the interval [0.4, 0.6] represents a reasonable doxastic attitude toward rain. A rational agent may decline to pay a dollar for a gamble that pays two dollars if and only if it rains this afternoon. Another rational agent may accept the gamble. The first agent assigns to rain a probability less than 0.5; the second agent, a probability greater than 0.5. Decision principles allow an agent with the doxastic attitude represented by an interval to bet as if she assigned one particular probability within the interval to the probability of rain, while another agent may choose to assign a different probability.

The divergence of choices masks an underlying agreement of doxastic attitudes.

This same situation holds in the case of prior probabilities. Prior probability values for a hypothesis and a test result come from the set of probability functions that represents a reasonable doxastic response to evidence prior to hypothesis testing. The set is the same for agents sharing the same total evidence. Agents seem to have different prior probabilities only because they use different probability functions to guide their choices.

A resolved, or refined, sense of subjectivity emerges easily from subjective Bayesian theories. Given a body of evidence, such theories encourage agents, in making decisions, to use many probability functions.

Let a thousand flowers bloom.

The objectivity of the probability functions as a whole—that remains.

A thousand flowers but one stalk.

Savage has expressed his willingness to prohibit probability functions that are unreasonable given a body of evidence. He is willing to accept, although he does not advance, constraints on probabilities that go beyond the axioms of probability theory and the principle of conditionalization.²² The Bayesian statistician may prohibit probability functions that do not fit an agent’s total evidence, but accept as objective the probability functions that remain. He may concede to Fisher the subjectivity of a decision made using a set of probability functions, but maintain the objectivity of the inferential step from the evidence to the doxastic attitude that the set represents.

Epistemic and pragmatic responses to the same evidence may differ as well. An agent’s epistemic response is tuned to features of the evidence; his pragmatic response, to his own preferences, desires, or intentions.

Given the facts, rationality demands purely an epistemic response, but rationality permits diverse pragmatic responses.

Letting pragmatic and epistemic considerations both affect a probability function creates the danger of double-counting, an inconvenience. An agent in choosing a probability function may be guided by pragmatic considerations, and then be guided by pragmatic considerations all over again when using the probability function to reach a judgment.

That is why standard decision principles separate the epistemic and the pragmatic, considered for the moment as impenetrable categories, in such a way that only epistemic assignments of probabilities guide decisions. Because of its role in deliberation, the rational doxastic response to a body of evidence is purely epistemic.

According to a common view about the relation between belief and degree of belief, an agent believes a proposition when he assigns to it a sufficiently high degree of belief. For example, a driver may believe that Interstate Highway 70 goes through St. Louis because his degree of belief that it does is greater than 0.8.

The threshold above which degree of belief generates belief varies with context. For an important meeting, the threshold for belief may rise to 0.9. Or not. Thresholds vary with context. They really do.

A belief may not be a purely epistemic response to evidence because it may be influenced by non-epistemic goals. Wishful thinking, for example.

An important decision may exert a pragmatic influence on an agent’s beliefs. If the stakes are low, a gambler may believe that a bet will win; if the stakes are high, he may withhold belief in a nervous puddle of indecision.

If belief is partly pragmatic, rationality is permissive about the formation of beliefs. All agents possessing the same body of evidence should have the same purely doxastic response. Still, each agent may form different beliefs according to his pragmatic calculations.

A refined Bayesian—someone rather like myself, in fact—classifies beliefs as either purely epistemic or partly pragmatic. He claims only that decisions are pragmatic and that the rational doxastic attitudes engendered by a set of probability functions are epistemic.

A body of evidence objectively settles them.

Imprecise Probabilities

What about other principles of probability, statistical inference, and decision? Do imprecise probabilities resolve the problem of subjectivity? Start with conditionalization. How does conditionalization proceed if the response to a body of evidence is a set of probability functions? New evidence serves to update each probability function in the set. As the evidence grows, the updating narrows the set of probability functions by eliminating functions that assign positive probabilities to propositions incompatible with the new evidence.

Savage’s convergence theorem then ensures that in certain common settings, updating the set of probability functions narrows its range.

Conditionalization requires diachronic coherence among sets of probability assignments. Here I am using diachronic in the technical sense of day to day. Sets have to be settled so that each admissible probability function updates an earlier admissible probability function.

Is this diachronic requirement in conflict with the synchronic requirement that a set of probability functions must fit the evidence at every given moment? Could be.

Suppose that the evidence settles on a set of probability functions S_p with respect to H. The probability of H ranges from 0.4 to 0.6. New evidence E appears; and E raises the probability of H to 1. Using E to conditionalize each probability function in S_p yields the same updated probability of H. For each function P, the updated P´(H) is derived from P´(H) = P(H|E) = P(H)P(E|H)/P(E). Assuming that P(E|H) = 1 and P(H) = P(E), it follows that P´(H) = 1. If the probability functions in S_p apply only to E, H, and derivative propositions formed by negation and disjunction, then updating S_p retail and updating S_p wholesale come to the same thing.

Requirements for single decisions do not extend to ensuring this type of coherence in multiple decisions; coherence among decisions is a requirement independent of requirements for single choices. Can this extra requirement be defended?

One argument for coherence among decisions would be that incoherence makes an agent susceptible to a sure loss. Suppose that an agent trades a cup of coffee and a dime for a cup of tea. If he then trades the cup of tea and a dime for the cup of coffee, he ends up where he started but two dimes poorer.

A rational agent may accept a sure loss as the cost of changing his mind during a sequence of decisions.

He might also simply prohibit change without sufficient reason, a kind of cognitive conservatism. But such a rule would be at odds with an agent’s freedom to use any admissible probability function to make a decision at any time.

What about arguments requiring coherence among preferences? Can they ground coherence among choices? Because the rationality of each preference in a set does not ensure the rationality of the set of preferences, coherence is an extra requirement. Preferring A to B and also preferring B to A is incoherent, but is typically permitted by the requirements for single preferences.

I am large, I contain multitudes.²³

Suppose that an agent in a sequence of decision problems involving A and B has a preference between A and B. A decision favoring A rather than B followed by another decision favoring B rather than A creates an incoherent sequence of decisions. A requirement that prohibits the sequence might be derived purely from coherence requirements for preferences and the demand that choices follow preferences.

Does the coherence of preferences require that every decision in a sequence of decisions be subordinated to the same probability and utility functions? Consider a case in which evidence settles a precise probability function, and according to it a unique option maximizes expected utility. Conflict arises if the option does not cohere with past choices.

A theory of rationality, on the other hand, prevents conflict by judging a sequence of decisions to be rational if each decision in the sequence is rational. The theory leaves no room for an independent coherence requirement for a sequence of choices.²⁴

Message after the Fact

Distinguish subjectivity in probability assignments from subjectivity in the use of probability assignments to make decisions.

The odds are that all will be well.