A Humble Little Probability Theorem

If it is the case that 

P(A) = 1 implies P(B) = y

then P(B | A) = y.

Is this a valid inference?  Or better:  under what conditions, if any, is this a valid inference?

It may well seem a natural, intuitive inference in certain cases.  For example, if I am certain that the coin is fair then I am certain that the probability of Heads in a fair coin toss is 0.5.  If instead I am certain that the coin is biased 3:1 in favor of Heads than I am certain that the probability of Heads in a fair coin toss is 0.75.  And in both cases, my conditional probability for a Heads outcome of a fair toss is the corresponding probability, 0.5 in the one case and 0.75 in the other.

But that is one sort of example, and not all examples are so simple. 

First I will show a very general form of modeling probabilistic situations in which this inference is indeed valid.  

Secondly I will show, with reference to Miller’s Principle for objective chance and the Reflection Principle for subjective probability, that there are important forms of modeling probabilistic situations in which the inference is not valid at all.  

And thereby hangs a tale.

One.  Simple chance or simple factual opinion

We can think about the coin tossing example in either of two ways.  The first is to equate the probabilities with objective chances, resulting from the structure of the coin and of the coin tossing mechanism.  The second is to equate the probabilities with the subjective probabilities of a person who has certain odds for the coin toss outcomes.  In both cases the set of probability functions that can represent the situation is simply all those that can be defined on the space {Heads, Tails}.  

That set, call it PR,  has a feature which could remain in similar models of more complex or more sophisticated forms:  PR is closed under conditionalization.  That is, if P is in PR, and A in the domain of P, then P( . | A) is also in PR.

Assumption I:  the set of all probability functions that could represent the probabilities of the propositions in a certain possibility space = <S, F> is closed under conditionalization.

Explanation:  S is the set of possible states of affairs, F is a field of subsets of S (including S) — the members of F we call propositions.  A model satisfying the Assumption is a couple M = <PR> where PR is a set of probability functions defined on S, which is closed under conditionalization (where defined).

Theorem 1.  If it is the case that for all P in PR that if P(A) = 1 then P(B) = y, then it is the case that for all P in PR  that P(B | A) = y when defined.

Proof.  Suppose that for all P in PR that if P(A) = 1 then P(B) = y.

Suppose per absurdum that for some member P’ of PR it is the case that P’(B |A) = z, and it is not the case that z = y.  

This implies that P’(A) > 0.  Let Q = P’(. |A), the conditionalization of  P’ on A.

Then Q(A) = 1 and Q(B) = z.  So there is a member of PR, namely Q, such that Q(A) = 1 and it is not the case that Q(B) = y.  

Two.  Enter higher order probabilities

It may be tempting to think that the theorems for probability when higher order probabilities are not admitted all remain valid when we extend the theory to higher order probabilities. Here we have a test case.

Sometimes one and the same formula plays a role in the modeling of very different situations, and sometimes a formula’s status in various roles ranges from audacity to triviality, from truism to absurdity.  All of that happens to be the case with the formula

            (*)  P(A | pr(A) = x) = x

first appearing as Miller’s Principle (connecting logical and statistical probability, now usually read as connecting measure of ignorance P with objective chance pr) and later as the Reflection Principle (connecting present opinion about facts with present opinion about possible future opinions about those facts).  Both principles have a very mixed history (see Notes)

To model probabilities connected with probabilities of those probabilities we will not assume Assumption I (indeed, will show it running into trouble) but rather principle (*), which I will refer to by the name of its second role, the Reflection Principle.

Assumption II.  There is a function pr which maps S into PR.  For each number r and member A of F we define [pr(A) = r] = { x in S: pr(x)(A) = r}.  For all A and r, [pr(A) = r] is a member of F.

(For most numbers, perhaps even for all but a finite set of numbers,  [pr(A) = r] will be the empty set.)  

(This looks audacious, but it is just how Haim Gaifman sets it up.)  

The Reflection Principle is satisfied exactly if for all P in PR, and all A in F and all numbers r, 

    P(A | pr(A) = r) = r when defined

Theorem 2.  If the Reflection Principle is satisfied then Theorem 1 does not hold for PR.

Proof.   Suppose P(A) = 1.  The Reflection Principle implies that P(A | pr(A) = 0.5) = 0.5 if defined, that is, if P(pr(A) = 0.5)>0. 

But given that P(A) = 1, P(A | pr(A) = 0.5) = 1 also.  

Therefore, if P(A) = 1 then P(pr(A) = 0.5) = 0.

So, with B = [pr(A) = 0.5] we see that for all P in PR,

     if P(A) = 1 then P(B) = 0.

However, it is not always the case that for all P in PR, P(pr(A) = 0.5) | A) = 0.

This last point I will not prove.  Think back to the example:  the probability that the chance of getting outcome Heads = 0.5, given that the actual outcome will be Heads, is certainly not zero.  For the actual outcome does not determine the chance that outcome had of occurring.  Similarly, if I am ignorant of the outcome, then my personal probability for that outcome is independent of what that outcome actually is.

Corollary. If Assumption II holds and the Reflection Principle is satisfied, then the appropriate set PR is not closed under conditionalization.

Of course that corollary was something already known, from the probabilist version of Moore’s Paradox.

NOTES

[1]  This is a recap, in more instructive and more general form of three preceding posts:  “Moore’s paradox”, “Moore’s Paradox and Subjective Probability”, and “A brief note on the logic of subjective probability”.

[2]  I take for granted the concept of a probability function P defined on F.  As to conditional probability, P(B | A) this is a binary partial function defined by P(B | A) = P(B ∩ A)/P(A), provided P(A) > 0.

[3] David Miller introduced what came to be called Miller’s Principle in 1966, and produced a paradox. Dick Jeffrey pointed out, in effect, that this came by means of a modal fallacy  (fallacy of replacing a name by a definite description in a modal context).  Karl Popper, Miller’s teacher, compounded the fallacy.  But there was nothing wrong with the principle as such, and it was adapted, for example, by David Lewis in his theory of subjective probability and objective chance.

[4] When I say that the Reflection Principle too has a mixed history I am referring to fallacies by its critics.

BIBLIOGRAPHY

 Jeffrey, Richard C. (1970)   Review of eight discussion notes.  Journal of Symbolic Logic 35, 124–127.

Miller, David  (1966)  A paradox of information. The British Journal for the Philosophy of Science, vol. 17,  no. 1, 59-61. 

Leave a comment