Gordon Belot’s “Bayesian orgulity” in Philosophy of Science 2013 argues that the Bayesians have a real problem here:
“their account of rationality renders a certain sort of arrogance rationally mandatory, requiring agents to be certain that they will be successful at certain tasks, even in cases where the task is so-contrived as to make failure the typical outcome.”
There is a great deal to Belot’s paper, but two things seem most important to me. One is that Belot elaborates on this “typical” in the above passage by introducing a measure-independent way of capturing the intuitive distinction of small versus large sets of possibilities. A Bayesian agent can dismiss the range of possible courses of events in which the forecasting is not well calibrated as ‘negligibly small’ — meaning that it has probability measure zero. And the agent can admit, with no misgiving, that if someone is of a different opinion, they may regard the same range of possibilities as large — meaning that their subjective probability for it is large. But Belot will cut across this, by going to topology to set aside those measures, and so is able to point out that certain sets are immensely large in a sense that all have to admit.
Secondly, Belot analyzes at length a forecasting situation, with updating, of a sort that should be paradigmatic for the Bayesian approach. When he then demonstrates that, in the sense in question, the agent’s ‘failure set’ is immensely large, it does not seem that any amount of stone-walling could remove the orgulous Bayesian’s discomfort.
First of all, Belot introduces the topological concepts of meagre and residual to capture the intuitive, measure-independent, senses of “small” and “large”. A meagre set is any union, finite or countable, of nowhere dense sets. A set is residual iff it is the complement of a meagre set.
Nowhere dense sets are intuitively small. As an example, if you think of a line segment with the points labeled by the real numbers in interval [0,1], the points labeled by the fractions 1/n, for natural numbers n, form a set that is nowhere dense in this interval. Its complement contains all those ‘in-between’ open intervals, like (1/2, 1), (1/4, 1/2) etc., which takes in so much of the total that it contains certainly, intuitively, the grand majority of its members.
But we can demur a bit here, or at least, point out that we can already see why a subjective probability of a meagre set of possibilities could be positive. First, a meagre set, if it is an infinite union of nowhere dense sets, may not be nowhere dense. And secondly, by the usual measure in geometry, Lebesque measure (intuitively length, area, volume) a nowhere dense set can have positive measure, though the examples are quite contrived. But I doubt that this will save the Bayesian from acute discomfort in the face of Belot’s arguments.
In order to see how the Bayesian agent’s self-assessment does when evaluated this way, imagine that a coin of unknown bias is going to be tossed repeatedly. The possible outcomes are precisely all the infinite sequences of 0s and 1s. It is convenient to think of this outcome space as a tree, with as origin a point where nothing has happened, and the branching is binary, adding a 1 to one branch and adding a 0 to the other.
The agent has a prior probability P function defined at least on all the sets defined by having as members all those that share a specific initial segment. For example, if k is the finite sequence <k(1), … , k(n)>, then that picks out the set
B[k] = {s : s(1) = k(1), … , s(n) = k(n)}
This set is a subtree, it has in it precisely all the binary branches that start in that same way.
As the results of each toss are received, P is updated by conditionalizing on the result. This takes a very simple form: at any given stage, there were two possibilities for the next outcome, and one was eliminated by the result. Thus if k is the sequence of outcomes observed previously, and the next result is 1, then P is updated by assigning zero to the entire subtree B[k,0].
Here is the challenge we are going to consider. Let’s say the real sequence, the one actualized in the long run, is x. The agent is not set an impossible task, what needs to be determined is only whether x belongs to a certain character. For example the question might be whether x becomes constant after a certain point, that is, end with either just 1s or just 0s. Or whether x is periodic, that is, whether a certain finite sequence of outcomes just keeps repeating.
Belot specifies that the class R of outcome sequences in question must have in it at least one sequence with any given finite initial segment. That is, no finite initial segment can definitely settle whether or not x is in R.
What would be success for the agent? Belot’s criterion is deceptively weak looking:
either x is in R, and eventually the probability of R is, and remains, greater than or equal to 1/2 for all time,
or x is not in R, and eventually the probability of R is, and remains, smaller than or equal to 1/2 for all time,
The failure set, in this problem, for a prior probability P is the set of cases for which this condition does not hold.
The Bayesian agent’s self-assessment, in this case, is that the failure set has probability zero.
But how large is the failure set, in measure-independent terms?
That we would expect to depend on the character of prior probability function P. The agent’s history, experience, or bias could for example make for a ‘closed-minded prior probability that included certainty that the actual sequence will lie in R, or certainty that it won’t.
The interesting case is that of an open-minded prior, such that the the door is left open at any finite stage of the data sequence. That means: given any initial segment s = <k(1), … , s(n) = k(n)>, there are still possible extensions of s such that updating on them would yield posterior probability of R greater than or equal to 1/2, and other possible extensions of s such that updating on them would yield posterior probability of R less than or equal to 1/2.
In other words, the agent’s opinion about R, however high or low it gets at any point, remains genuinely hostage to the fortunes of future experience — and isn’t that how it should be?
Belot concludes his article with a truly cool argument that establishes, in six elegant steps, that the failure set of an open-minded prior is a residual set. The failure set, which in the agent’s self-assessment receives probability zero is, in measure-independent terms, immensely large.
Remember Philip Dawid’s words:
“We have a paradox: an event can be distinguished (easily, and indeed in many ways) that is given subjective probability one and yet is not regarded as “morally certain.”
A paradox? Well, some kind of paradox, surely! And if that is so, we face a challenge, to solve, dissolve, or resolve the paradox. The next blog post will attempt to do so.