What is Bayesian orgulity? (4) About Dawid

Calibration is not a proper scoring rule; that is, being well-calibrated does not suffice as an overall evaluation of a person’s opinion.

Neither is truth! If you want your beliefs to do better when evaluated as to truth value, just be less informative. In the limit you could even guarantee success in this respect a priori: just stick to “que sera, sera” and other such greatly valued tautologies. Yet truth is an indispensable factor in any evaluation of opinion.

We can make similar points about calibration. The less informative a forecast is, the easier it is for it to earn a good calibration score. But as long as we don’t forget the value of informativeness, we must rightly look to calibration for the practical link between probability and reality — or so it would seem.

Philip Dawid’s 1982 paper “The Well-Calibrated Bayesian” sprung a surprise: any coherent Bayesian agent will expect to be well-calibrated. The paper’s conclusion is obtained while imposing stringent requirements on the forecasting, while insisting on precisely what, in the previous post, I called ‘perfect calibration’.

Just as above, the guiding example is that of a weather forecaster, and what is modeled is the sequential procedure of forecasting with feedback. We think of the forecaster as drawing, each evening, on the accumulated experience of all that has passed so far, including the outcomes of today’s events for which he issued forecasts yesterday.

To evaluate how the forecasting compares with reality, we can select a test set of days, and compare the proportion of days in which the events in question actually occur with the estimate (average forecast probability) for those events in that set.

The test set can be selected in advance, for example with a calendar: all the July days in the next ten years, or all the first Fridays. But the selection can also be allowed to depend on actual happenings; for example, it could be the set of all days that follow a day during which it rained. The selection procedure must be admissible: it can’t be one that could only be performed using a crystal ball, or in retrospect at the end of days. (Not fair to say that the score will be based on a comparison of the average announced probability of rain with the proportion of rainy days in the test set whose members are all and only the days on which it did not rain!)

And now, one of the many varieties of ‘law of large numbers’ proofs implies Dawid’s theorem:

If the selection process is admissible,

then as the size of the test set –> infinity,

the difference between the proportion and the estimate –> zero.

How did Dawid himself view this result? I quote (where II stands for the Bayesian agent’s prior subjective probability function):

“Any application of the Theorem yields a statement of the form II(A) = 1 where A expresses some property of perfect calibration for the distribution. In practice. however, it is rare for probability forecasts to be well calibrated (so far as can be judged from finite experience) and no realistic forecaster would believe too strongly in his own calibration performance. We have a paradox: an event can be distinguished (easily, and indeed in many ways) that is given subjective probability one and yet is not regarded as “morally certain.” How can the theory of coherence, which is founded on assumptions of rationality, allow such an irrational conclusion?” (my boldface)

Dawid notes immediately a start for a defense: there is a difference between zero probability and impossibility. You might have a subjective probability distribution for e.g. the current position of a mass of the moon. But if it is at all realistic, that will include zero probability for its its mass in kg being a rational number. We are familiar with this sort of distinction from measures in geometry: a line segment has zero area, a plane figure has zero volume, but they are not nothing.

A series of papers that followed, by Oakes, Schervish, and Dawid, focused on the question whether nature could be such as to prevent calibration in the first place, for anyone, by any means. Certainly it could! For proportions in an infinite sequence may not converge. This limitation to successful forecasting had been noted already by Reichenbach, and formally proved by his student Hilary Putnam. Mark Schervish’s result (1985) made the point most strikingly:

Theorem. The cardinality of the set of non-calibrable sequences

is that of the continuum.

To sum up then: the coherent Bayesian, asked for a self-assessment, expects fully (with subjective probability 1) to be well-calibrated, while both he himself and his colleagues can also prove that, no matter what, there is a non-denumerable infinity of ways nature in which the forecasting would not be well calibrated.

And the Bayesian agent’s one defense we have been able to note so far is that by his own lights, the set of all those possible but ‘bad’ scenarios, however large in other respects, has probability zero.

Next we will see how Belot’s much more recent paper “Bayesian Orgulity” makes it all look a great deal worse.

Leave a comment