Conditionalizing on a combination of probabilities

NOTE:   This is a going to be about what Dick Jeffrey called Superconditioning

But here applied to a topic of my own concern.

I like the physicists’ term “mixture” but the more traditional terminology is this:

Probability function P is a convex combination of probability functions p and q if and only if there are positive numbers a and b (the weights)in the interval [0,1] such that P = ap +bq.

This implies of course that a + b = 1.  The definition is easily extended to a cover a convex combination of any finite number of probability functions.

The following example is so simple that it may seem to harbor no complexities at all.  But it does.

EXAMPLE.  We are going choose at random one of three coins to be tossed: one is fair, one is biased for Heads 4:1 and one is biased for Heads 1:4.  My probability P is the convex combination with weights 1/3 of each of the functions p, q, r proper to the three coins:

            P(Heads) =  (1/3)(0.5) + (1/3)(0.8) + (1/3)(0.2) = 0.5

QUESTION.  What happens if I am told that the coin actually to be tossed is the one heavily biased in favor of Heads?  Obvious! I conditionalize on this new evidence.  I change my subjective probability to P(Heads) = 0.8, or if you like, I change it from P to q.

And what if instead I am just told it is either the fair coin or the one heavily biased in favor of Heads?  Obvious!  I conditionalize on this new evidence.  I change from P to (1/2)(p + q) and my new probability for Heads is 0.65.  Very natural.

But this makes no sense at all.  All four probability functions P, p, q, r have the same domain, namely the very little sample space {Heads, Tails}.  There is nothing in that domain that could be a proposition expressed by “the coin to be tossed is  …”.  These functions cannot be conditionalized on something that is not in their domain of definition.

To put it metaphorically: those three functions p, q, r pertain to three possible worlds, which are very different from each other.  They are all possible as far as my initial information is concerned.  Just imagine, in one of them the number of galaxies is most likely one, with our planet as the sole one inhabited, and in the others there are most likely billions of galaxies teeming with intelligent life …. 

We naturally think of this situation as consisting of a single set of possibilities, with three probability functions defined on it.  But really, we have three sets of possibilities, entirely alike in themselves, but each with its single, unique, different probability distribution.

So to represent what is happening here we have to switch to a larger sample space: that is what Jeffrey calls Superconditioning.  It is on that larger space on which P is defined, and in that space the functions p, q, r mark a partition.  If C(p) is the cell in this partition that is marked p then the probability of Heads in that cell equals 1/2.  And what P initially assigns probability 1/3 to, is that cell.

It is customary to keep using the same words and symbols but of course that is a practice designed to confuse.  No proposition in the small space {H, T} is a proposition in the larger space on which, we now say, our subjective probability is really defined. 

I will show the proper representation, formally constructed, in an Appendix.  But for now, let us go back to the example, and look at the informal, not yet dis-confused, question of whether the following is the case:

The probability that Heads comes up, given that the coin is so biased that the probability of Heads is 0.8, equals 0.8

or more generally, symbolically,

P(A | µ(A) = x) = x 

What is meant by this?  The proposition [µ(A) = x] must be the set, on which P is defined, which consists of just those cells in which the probability of A equals x.  In the case of Heads and 0.8 that is just Cell C(q), the one marked by q.  And P conditionalized on that cell does assign 0.8 to Heads.  

Somewhat more ambitiously, what about 

                        P(Heads | µ(A) > 0.4)   >  0.4 ?

Well,   [µ(A) > 0.4] must be the union of the cells in which the probability of Heads is greater than 0.4.  Those cells are precisely Cells C(p) and C(q), corresponding to probability functions p and q.  And P conditionalized on their union assigns 0.65 to Heads, so yes, something greater than 0.4.

You, esteemed reader, will not have missed my ulterior motives for this discussion.

APPENDIX.  Superconditioning formally presented.

Our initial model:

= <S, F> is a (sample) space:  S is a non-empty set and F is a field of subsets of S, which includes S.  The members of F I will call elementary propositions.

TPR = {p1, …, pn}   is a finite set of probability functions each with domain F.  

Pin = ∑{bjpj: j = 1, …, n} is a convex combination of the members of TPROB.  

M = <S, TPR, Pin > is our initial model.

Our final model:

For each j, from 1 to n, Sj = <Sj, Fj> is an isomorphic copy of S.  These are disjoint.

There is a homomorphism : ∪{Fj: j = 1, …, n} –> F and the restriction of  to Fj is an isomorphism to F.  

 pj* is a probability function on Sj such that pj*(A) = pj(f(A)) for each proposition A in Sj.  

TPR* = { pj*| j = 1, …, n}

S* = <S*, F*> is the sample space with S* = ∪{Sj: j = 1, …, n}.

Field F*  on S* is the least field that has each set Sj as a member, as well as all the members of each field Fj , for j = 1, …, n.  The sets Sj are therefore the cells in a partition, and all other members of F* are unions of subsets of those cells.  

P is the probability function defined on S* as follows:

            For j = 1, …,n

  1. P(Sj) = bj
  2. for each A in F*, P(A|Sj) = pj*(A).  Equivalently, for each A in F*, P(A ∩ Sj) = P(Sj)pj*(A).  
  3. P is additive: if A and B are disjoint members of F* then P(A ∪ B) = P(A) + P(B)

It is clear that 3. does not conflict with 2. since pj* is additive.  Since the weights bj are positive and sum to 1, and each function pj* is a probability function which assigns 1 to Sj it follows that P is a probability function with domain F*.

M* = <S*, TPR*, P> is our final model.

Application

The proposition expressed by “the probability of A equals x” is now represented by the union of the cells Sj  such that  P(A | Sj) = x, that is, such that  P(A ∩ Sj) = P(Sj)x.  

Let those cells be  Sj  with j = k, …, m.   So the proposition [probability of A = r] is the proposition ∪{Sj: j = k, …, m}

Since these cells are disjoint, 

P(A ∩ [∪{Sj: j = k, …, m}])    =         ∑P(A ∩ Sj) : j = k, …, m}

                                                =          x∑ P(Sj) : j = k, …, m}

                                                =          x[probability of A = r]

or equivalently,

            P(A | [probability of A = x]) = x.

… an expression with a familiar look in many contexts …

REFERENCES

Richard Jeffrey (1988) “Conditoning, Kinematics, and Exchangeability”.  Pages 221-256 in B. Skyrms and W. L. Harper (eds.) Causation, Chance, and Credence. Proceedings of the Irvine Conference on Probability and Causation, Volume 1.  Dordrecht: Kluwer.

Leave a comment