A Probability Puzzle

  1. A Version of the Puzzle             p. 1
  2. Diagnosis                                    p. 2
  3. What About Vague Probabilities?   p. 3
  4. APPENDIX.  The ‘Orthodox’ Representation of Vague Opinion p. 5

This puzzle was devised by Roger White (2010: 175ff.)  in support of an argument against the very idea of vague probability judgements.  (See e.g. Topey 2012 for discussion.)

To begin I will take up the puzzle itself, in its general form, as it applies also to precise opinion, and the fallacy it tends to evoke.  Then I’ll discuss decisions under vague uncertainty, and end with an Appendix for open problems of a technical sort.

1.   A Version Of The Puzzle

 Time 0.  Jack has a coin that you know to be fair. There is a certain proposition p about which you are uncertain (in one way or another), but you know that Jack knows whether p. Jack paints the coin so that you can’t see which side is Heads and which side is Tails, then writes ‘p’ on one side and ‘~p’ on the other.  Jake tells you that he has placed whichever is true on the Heads side, and its contradictory on the  Tails side.  Jake will toss the coin so that you can see how it lands. 

Time 1. Jack tosses the coin, and you see that it has landed with the side marked ‘p’ facing up 

What does this do to your opinion about how likely it is that p is true?

Now we may be inclined to reason as follows:  

[ARG] “This coin is fair, so the probability is 0.5 that it landed Heads up.  But given that is showing, p is true iff the coin landed Heads up.  Therefore  the probability that is true is 0.5.”

Notice that it does not matter what  is, except that you are uncertain about it.  Also note that your prior probability for (whether precise or vague) makes no difference, to what your posterior probability becomes.

Notice also that if you had seen that the coin had landed showing ~p, you would have come to the posterior probability 0.5 for ~p, and hence also for p, by an exactly similar argument.  Therefore, it is predictable beforehand that your posterior probability for p is 0.5, regardless of which proposition is, and regardless of your prior probability for it.  As soon as Jake has told you what he is going to do, if you believe you will look at the coin when it has landed, you know how likely you will take  to be at the end.

White dismisses this argument, with the words “But this can’t be right. If you really know this in advance of the toss, why should you wait for the toss in order to set your credence in p to 1 /2?”.

And dismiss it he should!  For form of argument [ARG] quickly leads to total incoherence.

EXAMPLE

Jake has three cousins, called Jack, Jim, and Jules.  They approach Mark, offering the same procedure as Jake’s, but for specific propositions.  They have looked at die, and have recorded which face is up.  Jack tells Mark he has a fair coin, and will write “The face up was either 1 or 2” on the Heads side if that was true, and on the Tails side otherwise, with the negation on the other side.  Then he will toss that coin and Mark can see the result.

Mark remembers the entire discussion following Jake’s procedure, he accepts [ARG] as the proper reasoning, and so concludes that after seeing the result, whatever it is, he will have probability 0.5 that the outcome was either 1 or 2.

Jim now gets into the act, in precisely the same way, with the proposition “The outcome was 3, or 4”.  Then Jules, of course, for the proposition “The outcome was 5 or 6”.  Each is referring to the same coin record as Jack.

After they are done Mark has probability 0.5 that the outcome was either 1 or 2, and 0.5 that it was 3 or 4, and 0.5 that it was 5 or 6.  So his probability that the outcome was 1, 2, 3, 4, 5, or 6 is now 1.5.  His opinion is now completely incoherent.

To press the point home: a little theorem

It takes just a couple of lines to prove that for any probability function P and any propositions p and q in its domain, P(p) is between P(p|q) and P(p|~q), when defined. 

So applications of [ARG] will be invalid whenever P(p) is a (sharp) probability other than 0.5.

For example, suppose p = (BvF won the lottery), and for me, p has a probability of less than a million (as it does).  Then there does not exist any proposition q such that I must assign 0.5 to p, by conditionalization, both when my evidence is q and when it is ~q.  

2.   Diagnosis

Argument [ARG] is spurious. 

When Jake sets his procedure in motion, the question I must ask myself is this:  

when Jake goes to place p on one side of the coin, how likely is he to place it on the Heads side?  

Well, he will do so only if is true.  And how likely is that?

Suppose I bought one ticket in the lottery and Jake has checked whether it was the winning ticket.  For p he selects You have won a million dollars. 

Well, how likely is it that is true?  

For me, it has probability less than one in a million.  So if I see that sentence on top, I say:  this coin landed Heads up on this particular occasion only if Jake wrote p on the Heads side.  And this he did only if I turned out have had the winning ticket.  So the probability that the coin landed on the Heads side, on this particular occasion, is the probability that I won a million dollars, which is less than one in a million

Landed Heads implies I won a million. So Prob(Landed Heads) ≤ Prob(I won a million)

This does not deny for a moment that the coin is fair, and that it certainly was the case that the probability was 0.5 that the coin would land Heads up on that particular toss.  But now that the coin is lying there, we have to go with what we know about Jake’s procedure. 

3.   What About Vague Probabilities?

Let’s first discuss vague probability taken as a general subject, setting aside for now any questions about the ‘orthodox’ representation of vague opinion (which is by means of families of probability functions).

Suppose then that I have no precise opinion at all, let alone sharp probabilities, for proposition p. In that case, when I see p displayed on top of the coin, I can’t reason with myself about how likely Jake was to place p on the Heads side of the coin.  Thus what Jake has told me about how he would proceed, depending on whether p is true, has given me no usable information at all.  There is nothing for me to process.

So I am at a loss, in the extreme case where I have no opinion at all.  But what that sort of case can be is not easy to grasp, and I will give a concrete example below.

Vague opinion is not usually so totally vague as all that.  In a more practical case, e.g. that the weatherman’s forecast was that it will rain tomorrow, I do have some opinion.  For example, I may say that this is at least as likely as not. That is, my probability is at least 0.5, or equivalently (if we want to put it that way) the interval [0.5, 1].  

What if I am offered a bet on this, with prize 1 utile?  There is one highly conservative policy I could follow: if buying the bet, pay no more than 0.2, if selling take no less than 1.  As to any other offer, just say no.

Well, that is fine with such a cozy bet on an innocuous subject, but what if a great deal depends on it? What if, in William James’ terms, the choice is forced, so that not betting is itself a choice with possibly awful consequences?  To jump the chasm may cost you your life or it may save you, but if you do not jump you are almost certain to suffer debilitating exposure.

The other, highly permissive policy is to say: if you want, buy the bet at any price between 0.5 and 1, inclusive.  None of these choices has anything to favor it over the others, but each has the merit that you may prefer it to inaction, although you cannot calculate a higher expectation value.

THE GAMBLE = AN OPINION UPDATE?

Suppose that in the above illustration I am offered a bet on it will rain tomorrow, with payoff 1if true (and 0 if false), for 0.6 utiles.  Suppose I buy the bet.  Am I irrational?

If that is irrational, then we are all irrational all the time, when we go into stores and buy things.

What did I do?

(I) Taking into account all the information I have, and judging it at least as likely as not that it will rain tomorrow, though not above nine times as likely as not, I know that I take a risk by buying the bet for 0.6, a risk that I cannot quantify.

Now, there is a longstanding idea that my opinion is whatever it is that is exhibited in my willingness to bet.  If we apply that idea here, directly and uncritically, we arrive at:

(II) The act of betting 0.6 for a 1/0 option on rain tomorrow, at that point, shows that I have just updated my probability for rain tomorrow to the sharp probability 0.6.

Plausible in view of the tradition, concerning credence or subjective probability, that we are all part of, certainly.  But (II) contradicts (I).  For if (II) is correct, then the agent, me, has quantified the risk.

(I) says in effect that I am not changing my opinion about rain tomorrow at all. Rather, my opinion does not suffice to determine my decision. Note that there was clearly no opinion updating going on, for between the formulation of my opinion and the offer of the bet there was no new information to update on! 

To show what my opinion is, I will continue to counsel anyone who asks that I can say no better than that the probability of rain tomorrow is at least 0.5.  Then they can decide for themselves whether to take a risk with bets that cost more than 0.5, or not. 

To me this is common sense.

A concrete example of ‘no opinion at all’

Roger White has an objection to (I), arguing that the permissive policy would lead to financial ruin.  The policy would permit you to bet the same 0.6 each time, which would ignore all that is left open by that vague opinion.  Although we do not know this, the chance might in each case be 0.5, while the agent keeps buying the bet for 0.6.

But this just ignores learning.  Even an uneducated but reasonable gambler will keep lowering his bets if he is consistently losing. To be more concrete, since such a repetition of chances of rain is not plausible, suppose that the Jake puzzle example has been set up with proposition identified so as to make sure of our total ignorance.  

An experiment has been set up with a coin of unknown bias, it is tossed, and is the proposition that it landed Heads up.  Then Jake, who knows the result, continues the process with his fair coin, as in the puzzle.

What does it mean that the first coin is a coin with unknown bias?  The probability that this coin lands Heads up is x, and x could equally be any number in [0,1].  Well, what is “equally”?  What is it for x to be a random selection from [0,1]? There are different answers we could give here, but let’s take this one: for any two sub-intervals of [0,1] that are of equal length, the probability that x belongs to them is the same.

Then Jake’s procedure is in effect a two-coin process with unknown bias in the smaller interval [0, 0.5].

On the liberal policy if I am now asked to bet on whether both coins landed Heads up on a specific occasion, I could for example choose to buy the bet for 0.2.  White’s  argument implies that this liberal policy permits me to make that same choice each time if the experiment is endlessly repeated, and that this strategy would lead to financial ruin with certainty.  

Is that so?

If the experiment is repeated, there are two possibilities that will merit the “unknown bias” label.  First, it may be repeated each time with the same coin (or coin with the same bias). Second, the choice of bias in the tossed coin may be randomized as well.

In the first case, if the real bias is below 0.2 then I will lose more often than by chance.  White ignores the information gained from this: in fact the results will allow me to learn, to modify my betting behavior, so as to converge on the real bias, whereafter I will not be consistently losing.  If on the other hand the real bias is above 0.2 then I am making money!  More power to me.

The second case is not so different, for to make this precise we must again specify what the randomness, in the successive choices of coins, amounts to.  And depending on what it is, there will typically be in effect an average bias.  The gambler can learn from the results, and depending on the gains or losses may be consistently lowering his bets, or else, be happily raking in the money!

But we still have a question.  What can updating vague opinion be like, in a case where there is genuine new information?  Nothing in the above discussion touches that question as yet.

There is more than one answer in the literature, I will mention some in the NOTES. White targets the ‘orthodox’ probabilistic representation of vague opinion (“mushy credence”), so let us look at that.  But since phenomena are all and theories are creatures of the imagination only, I am isolating the technical questions from the general discussion.

4.   APPENDIX.  The ‘Orthodox’ Representation Of Vague Opinion 

Take it that the agent’s opinion is stored as a coherent set of judgments of the following forms:

P(p) ≤  x,          P(p) ≥ y

with p belonging to a specific Boolean algebra, the domain of P.  That will in effect include P(p) = x, when includes both P(p) ≤  x and  P(p) ≥ x.

The agent’s representor is the set of all probability functions on that algebra which satisfy all members of S.  So for example, if the agent’s opinion is that rain is as likely as not, then all the members of the representor assign 0.5 to rain.

As an example to illustrate the main difficulty, suppose that p and q are logically independent propositions, and that the agent judges that each of them is as likely as not.

For example, p = it will rain tomorrow  and q = I mislaid my hat.

Now the agent gets evidence q.

The orthodox recipe for updating is this:  replace each member by its conditionalization on q if that is well-defined, and eliminate that member if not.

What is the result?  Well, for each number y in [0,1] there is a function Q belonging to this representor S such that Q(p|q) = y.  So after this updating, there is for each number y in [0, 1] a function in the posterior representor which assigns y to p.  So after updating, the opinion about rain tomorrow, which was entirely irrelevant to my mislaying my hat, is now totally vague.

Updating in this way is debilitatingly destructive.

Two options

The above result, with examples, and surrounded by both informal and technical discussions, was in the literature well before Roger White’s paper.  

The first idea we can try out is that we could prevent this disaster by putting constraints on the representor, by additions to state of opinion S.  We can add judgments of expectation value, rather than just probability, and these allow us to add judgments of conditional probability.  But the problem recurs at that level, any fix that remains with linear relations does not suffice.  We’d have to add non-linear constraints, in some way, for independence and correlation are not expressible in any other way. 

  •  Anyone have suggestions?  Constructive attempts to find a better representation of vague opinion?

The second idea is that it is conditionalization that is at fault, and that indeed the fault lies with the idea that the representor is to be updated point-wise. Updating the representor needs to be a holistic action, an action that preserves certain important structure of the representor as a whole.

How can we think about this?  The representor is a convex structure:  for if P(p) ≤  x and P’(p) ≤ x then so does any convex combination of P and P’.  (Similarly for expectation value constraints.)  

  • That suggests looking at the theory of convex structures taken as wholes.   Anyone have suggestions?

NOTES  

Originally I subscribed to the ‘orthodox’ representation of vague probability, with conditionalization as updating method.  But looking at the dilation effect (cf. Seidenfeld and Wasserman 1993) I found that it ran into the trouble with conditionalization that I described above (see my papers listed below).

I mentioned that we could look into non-linear constraints on the representor.  Difficult probably, but there is a study by Halpern, Fagin, and Megiddo (1990) that could be a resource for this idea.

As I said above, there are different answers in the literature, for questions about how to represent vague probability.  One that is quite different from the ‘orthodox’ way is by Fagin and Halpern (1991).

For the weakness of arguments for updating pointwise by conditionalization, and the possibility of alternatives, the place to begin is Grove and Halpern (1998).

As to the different policies for decision making under vague uncertainty, an important technical discussion is by Teddy Seidenfeld (2004). Isaac Levi’s concept of E-admissibility is a candidate for the precise form of the liberal policy. Levi himself is easier to read. The quickest introduction though is section 3 of Seidenfeld’s retrospective on Levi’s work.

REFERENCES

Fagin, R. J. Y. Halpern, and N. Megiddo  (1990) “A Logic for Reasoning about Probabilities”.  Information and Computation 87.1,2: 78-128.

Fagin, R. and J. Y. Halpern  (1991)  “Uncertainty, belief, and probability”. Computational Intelligence 7: 160-173

Grove, A.J. and Halpern, J.Y. (1998) “Updating Sets of Probabilities”.  Proceedings of the Fourteenth Conference on Uncertainty in AI, 173–182.  Available at https://www.cs.cornell.edu/home/halpern/papers/bas.pdf

Seidenfeld, T. (2004) “A contrast between two decision rules for use with (convex) sets of probabilities”. Synthese 140:69-88.

Seidenfeld, T., and Wasserman, L. (1993). “Dilation for Sets of Probabilities”. Annals of Statistics 21: 1139-54.

Topey, Brett (2012)  “Coin flips, credences, and the Reflection Principle”. Analysis 72: 478-488.

van Fraassen, Bas C.   (2005)  “Conditionalizing on violated Bell’s inequalities”. Analysis 65.1: 27-32.

van Fraassen, Bas C.  (2006) “Vague Expectation Loss”. Philosophical Studies 127: 483–491.

White, Roger (2010)  “Evidential symmetry and mushy credence”.  In Oxford Studies in Epistemology, Vol 3. Ed. T. S. Gendler and J. Hawthorne, 161-188.  New York: Oxford U Press.

Leave a comment