Conditionals and the Candy Bar inference

At a conference at Notre Dame in 1987 Paul Teller “made an issue” as he wrote later of “the fallacious form of argument I called the ‘Candy Bar Principle’:

from ‘If I were hungry I would eat some candy bar’ conclude ‘There is some candy bar which I would eat if I were hungry’.”  

And Henry Stapp, whom Teller had criticized mentioned this in his presentation: “Paul Teller has suggested that any proposed proof of the kind I am setting forth must contain such a logical error ….”

I do not want to enter into this controversy, if only because there were so many arguments swirling around Stapp’s proposed proofs.  Instead I want to examine the question:  

is the Candy Bar inference a fallacy?

Let’s formulate it for just a finite case:  there are three candy bars, A, B, and N.  The first two are in this room and the third is next door.  I shall refer to the following form of argument as a Candy Bar inference:

If I choose a candy bar it will be either A or B

therefore,

If I choose a candy bar it will be A, or, if I choose a candy bar it will be B

and I will symbolize this as follows:

C –> (A v B), therefore (C –> A) v (C –> B)

This has a bit of a history of course: it was submitted as valid in Robert Stalnaker’s original theory of conditionals and was rejected by David Lewis in his theory. Lewis showed that Stalnaker’s theory was inadequate, and blamed this principle. But we should quickly add that the problems Lewis raised also disappeared if this principle were kept while another one, shared by Stalnaker and Lewis, was rejected. This is just by the way, for now I will leave all of this aside.

How shall we go about testing the Candy Bar inference?

I imagine that the first intuitive reaction is something like this:

Imagine that I decide to choose a candy bar in this room.  Then it will definitely be either A or B that I choose. But there is nothing definite about which one it will be.  

I could close my eyes and choose at random.

Very fine!  But unfortunately that is not an argument against the Candy Bar inference, but rather against the following different inference:

It is certain that if I choose, then I will choose either A or B,

therefore

Either it is certain that if I choose I will choose A, or, it is certain that if I choose I will choose B

That is not at all the same, for we cannot equate ‘It is certain that if X then Y’ with ‘if X then Y’.  As an example, contrast the confident assertion “If the temperature drops it will rain tomorrow” with “It is certain that if the temperature drops it will rain tomorrow”.  The former will be borne out, the prediction will be verified, if in fact the temperature drops and it rains the next day — but this is not enough to show that the latter assertion was true.

So the intuitive reaction does not settle the matter.  How else can we test the Candy Bar inference?

Can we test it empirically?  Suppose two people, Bob and Alice of course, are asked to predict what I will do, and write on pieces of paper, respectively, “if Bas chooses a candy bar in this room, he will choose A” and  “if Bas chooses a candy bar in this room, he will choose B”.  Surely we will say:  

we know that if Bas chooses a candy bar in this room, he will choose A or B.  

So if he does, either Bob or Alice will turn out to have been right.

And then, if Bas chooses A, we will say “Bob was right”.

That is also an intuitive reaction, which appears to favor the Candy Bar inference.  But again, it does not really establish much.  For it says nothing at all about which of these conditionals, if any, would be true if Bas does not choose a candy bar.  That is the problem with any sort of empirical test: it deals only with facts and does not have access to what would have happened instead of what did happen.

Well there is another empirical approach, not directly to any facts about the choice and the candy bars, but to how reasonable, practical people would let this situation figure in their decision making.

So now we present Alice and Bob with this situation and we ask them to make bets.  These are conditional bets, they will be Gentlemen’s Wagers, which means that they get their money back if Bas does not choose.

Alice first asks herself:  how likely is he to choose a bar from this room, as opposed to from next door (where, you remember, there is bar N)  Suppose she takes that to have probability 3/4.  She accepts a bet that Bas will choose A or B, if he chooses at all, with payoff 1 and price 0.75.  Her expectation value is 0, it is just a fair bet.

Meanwhile Bob agrees with her probability judgment, but is placing two bets, one that if Bas chooses he will choose A, and one that if Bas chooses he will choose B.  These he thinks equally probable, so for a payoff of 1 he agrees to price 3/8 for each.  His expectation value is 1/4(0) + 3/8(1) + 3/8(1) minus what he paid, hence 0:  this too is just a fair bet.  

Thus Alice and Bob pay the same to be in a fair betting situation, where the payoff prices are the same, though one was, in effect, addressing the premise and the other the conclusion of the Candy Bar inference.  So, as far as rational betting behavior is concerned then, again, there is no difference between the two statements.

Betting, however, as we well now by now, is also only a crude measuring instrument for what matters.  The fact that these are Gentlemen’s Wagers, as they pretty well have to be, once again means that we are really only dealing with the scenario in which the antecedent is true.  The counterfactual aspect is beyond our ken.

To be clear:  counterfactual conditionals are metaphysical statements, if they are statements about what is the case, at all.  They are not empirical statements, and this makes the question about the validity of the Candy Bar inference a metaphysical question.

There is quite a lot of every-day metaphysics entrenched at the surface of our ordinary discourse. Think for instance of what Nancy Cartwright calls this-worldly causality, with examples like the rock breaking the window and the cat lapping up the milk. 

Traditional principles about conditionals, just as much as traditional principles about causality, may guide our model building.  And then nature may or may not fit our models …

So the question is not closed, the relation to what is empirically accessible may be more subtle than I managed to get to here.   To be continued …. 

REFERENCES

The Notre Dame Conference in question had its proceedings published as Philosophical Consequences of Quantum Theory:  Reflections on Bell’s Theorem (ed. J. T. Cushing and E. McMullin; University of Notre Dame Press 1989).   

My quotes from Teller and Stapp are from pages 210 and 166 respectively.

Probabilities of Conditionals (1): finite set-ups

If a theory has no finite models, can we still discuss finite examples, taking for granted that they can be represented in the theory’s models?

It is a well-known story:  Robert Stalnaker introduced the thesis, now generally called 

The Equation             P(p → q) = P(q|p), provided P(p) > 0

that the probability of a conditional is the conditional probability of consequent-given-antecedent.  Then David Lewis refuted Stalnaker’s theory.

In 1976 proposed The Equation for a weaker logic of conditionals that I called CE.  The main theorem was that any probability function P on a denumerable or finite field of sets (‘propositions‘) can be extended to a model of CE incorporating P, with an operation → on the propositions, satisfying The Equation.

To be clear:  the models for CE endowed with probability in this way are very large, the universe of possible worlds non-denumerable.  But taking a cue from the proof of that theorem,  I mean to show here that we can in practice direct our attention to finite set-ups.  These are, as it were, the beginnings of models, and they can be used to provide legitimate examples with manageable calculations.

The reason the theory’s models get so large is that the conditional probabilities introduce more and more numbers (See Hajek 1989).  

Example. Consider the possible outcomes of a fair die toss: 1, 2, 3, 4, 5, 6.  With these outcomes as possible worlds, we have 2^6 propositions, but all the probabilities assigned to them are multiples of 1/6.  So what is the conditional probability of the outcome is 5, given that it is not 6?  Probability 1/5.  What is the conditional probability of the outcome is 4, given the outcome is less than 5? Probability 1/4.  Neither is a multiple of 1/6.  

Therefore, none of those 2^6 propositions can be either the proposition that the outcome is (5 if it is not 6), or the proposition that the outcome is (4 if it is less than 5).

In the end, the only way to allow for arbitrarily nested conditionals, in any and all proposition algebras closed under , is to think of any set of outcomes that we want to model as the slices of an equitably sliced pie which is infinitely divisible.  

The telling examples that we deal with in practice do not involve much nesting of conditionals.  So let us look into the tossed fair die example, and see how much we have to construct to accommodate simple examples.  I will call such a construction a set-up.

(In the Appendix I will give set-ups a precise definition of set-ups as partial models, but for now, will explain them informally.)

GENERAL: MODELS AND PROPOSITION ALGEBRAS

As do Stalnaker and Lewis, I define the → operation by using a selection function s: this function s takes any proposition p and any world x into a subset of p,  s(p, x).  

world y is in (p –> q) if and only if s(p, y) ⊆ q

The main constraint is that s(p, x) has at most one member.  It can be empty or be a unit set.  Secondly, if y is in p, then s(p, y) = {y}, and if p is empty then s(p, y) is empty too.  There are no other constraints.  

Specifically, unlike for Stalnaker and Lewis, the selection is not constrained by a nearness relation.  I do not take the nearness metaphor seriously, and see no convincing reason for such a constraint.  But I use the terminology sometimes, just as a mnemonic device to describe the relevant selection:  if s(p, x) = {y} I may call y the nearest p-world to x.  

The result of the consequent freedom is that if p and q are distinct propositions then the functions s(p, .) and s(q, .) are totally independent — each can be constructed independently, without any regard to the others.

That allows us to build parts of models while leaving out much that normally belongs in a model.

EXAMPLE: THE OUTCOMES OF A TOSSED DIE

A die is tossed, that has six possible outcomes, there are six possible worlds:  (1) is the world in which the outcome is 1, and similarly for (2), …, (6).  I will call this set of worlds S (mnemonic for “six”).   There is a probability function P on the powerset of S: it assigns 1/6 to each world.  I will refer to the set-up that we are constructing here as Set-Up 1.

As examples I will take two propositions:

p = {(1), (3), (5)}, “the outcome is odd”. This proposition is true just in worlds (1), (3), (5).

q = {(1), (2), (3)}, “the outcome is low”.  This proposition is true just in worlds (1), (2), (3) 

Each of these two propositions has probability 1/2.

The idea is now to  construct s(p,.) so that P(p → q) = P(q|p).  I claim no intuitive basis for the result. Its purpose is to show how the Equation can be satisfied while observing basic logic of conditionals CE.

It is clear that (p → q) consists of two parts, namely (p ∩ q) and a certain part of ~p.  Can we always choose a part of ~p so that the probabilities of these two parts add up to P(q|p)?  A little theorem says yes: 

               P(q|p) minus P(p ∩ q)  ≤   P(~p).

Of course at this point we can only do so where the only probabilities in play are multiples of 1/6.  Later we can look at others, and build a larger partial model.  I will show how the small set-up here will emerge then as part of the larger set-up, so nothing is lost.

For a given non-empty proposition p, we need only construct (p → {x}) for each x in p, making sure that their probabilities add up to 1.  The probabilities of the conditionals (p → r), for any other proposition r, are then determined in this set-up.  That is so because S is finite and in any proposition algebra (model of CE),

(p → t) ∪ (p → u) = [p → (t ∪ u)]

So let us start with member (1) of proposition p, and define s(p, .) so that P(p → {(1)}) = 1/3, which is the conditional probability P({(1)} | p).

That means that (p → {(1)}) must have two worlds in it, (1) itself and a world in ~p.  Therefore set

 s(p, (2)) = {(1)}.

Then (p→{(1)}) = {(1), (2)} which does indeed have probability 1/3.

Similarly for the others (see the diagram below, which shows it all graphically):

 s(p, (4)) ={ (3)},     s(p, (6)) = {(5)}

You can see at once how we will deal with s(~p,.)

s(~p, (1)) = {(2)},    s(~p, (3)) = {(4)},    s(~p, (5)) = {(6)}

so that, for example, (~p → {(2)}) = {(2), (1)}, which has probability 1/3, equal to the conditional probability P({(2)} | ~p).

What about (p –> {(6)})?  There is no world x such that s(p, x) = {6}.  So (p → {(6)}) is the empty set and P(p –> {(6)}) = 0, which is indeed P({(6)}|p).

Let’s see how this works for p with the other proposition q, “the outcome is low”; that is, the proposition q = {(1), (2), (3)}

 (p → q), “if the outcome is odd then it is low”, is 

  • true in (1) and (3) since they are in (p ∩  q), they are their own nearest p-worlds.
  • true in (2) and (4), since their nearest p-world are (1) and (3) respectively 
  • false in (5) and (6) since their nearest p-world is (5) “odd but not low”

(~p → q), “if the outcome is even, then it is low”, is

  • true in (2) since it is in ~p ∩ q
  • true in (1) since its nearest ~p-world is (2), “even and low”
  • false in (3), for its nearest ~p world is (4), “even and high”
  • false in (4), for it is its own nearest ~p world, “even and high”
  • false in (5), for its nearest ~p world is (6), “even and high”
  • false in (6), for it is its own nearest ~p world, “even and high”

So (p → q) is {(1), (3), (2), 4)} which has probability 2/3; ……….we verify that it is P(q|p)

     (~p → q) is {(2), (1)}, which has probability 1/3; ……………………..we verify that it is P(q|~p) 

A DIAGRAM OF THE MODEL: selection for antecedents p  and ~p

The blue arrows are for the ‘nearest p-world’ selection, and the red arrows for the ‘nearest ~p-world’ selection.

THE SECOND STAGE:  EXPANDING THE SET-UP

Above I gave two examples of conditional probabilities that are not multiples of 1/6, but of 1/4 and of 1/5.  In Set-Up 1 there is no conditional proposition that can be read as “if the outcome is not six then it is five”.  The arrow is only partially defined.  So how shall we improve on this?

Since the smallest number that is a multiple of all of 6, 4, and 5 is 60, we will need a set-up with 60 worlds in it, with 10 of them being worlds in which the die toss outcome is 1, and so forth.

So we replace (1) by the couples <1, 1>, <1, 2>, …., <1, 10>.  Similarly for the others.  I will write [(x)] for the set {<x, 1>, …, <x, 10>} Giving the Roman numeral X as name to {1, …, 10}, our set of worlds will no longer be S, but the Cartesian product SxX.  I will refer to the set-up we are constructing here as Set-Up 2. The probability function P is extended accordingly, and assigns the same probability 1/60 to each member of SxX.

Now we can construct the selection function s(u, .) for proposition u which was true in S in worlds (1), …, (5) – read it as “the outcome is not six” – and is true in our new set-up in the fifty worlds <1,1>, …, <5, 10>.  As before, to fix all the relevant probabilities, we need:

(u → [(t)]) has probability 1/5 for each (t), from 1 to 5.

Since [(t)] is the intersection of itself with u, it is part of (u → [(t)]).  That gives us ten elements of SxX, but since 1/5 = 12/60, we need two more.  They have to be chosen from ~u, that is, from [(6)]

Do it systematically:  divide ~u into five sets and let the selection function choose their ‘nearest’ in u appropriately:

s(u, <6, 1>) = {<1,1>}. <s(u, <6,2>) = {<1,2>}

s(u, <6, 3>) = {<2,1>}. <s(u, <6,4>) = {<2,2>}

s(u, <6, 5>) = {<3,1>}. <s(u, <6,6>) = {<3,2>}

s(u, <6, 7>) = {<4,1>}. <s(u, <6,8>) = {<4,2>}

s(u, <6, 9>) = {<5,1>}. <s(u, <6,10>) = {<5,2>}

So now (u → [(5)]) = {<5, 1>, …, <5, 10>, <6,9>, <6,10>}, which has twelve members, each with probability 1/60, and so this conditional has probability 1/5, which is the right conditional probability.

It will be clear enough now, how we can similarly construct s(r, .) for proposition r read as “the outcome is less than 5”, which requires conditional probabilities equal to ¼.

HOW SET-UP 1 RE-APPEARS IN SET-UP 2

And it should also be clear how what we did with propositions p and q in the earlier set-up, with universe of worlds emerges in this larger set-up in the appropriate way.  For example, the proposition read as “the outcome is low” is now the union of [(1)], [(2)], and [(3)], and so forth.

Of course, there are new propositions now.  For some of these we can construct a selection function as well. For example, the proposition (u → [(5)]) which we just looked at has twelve members, and the probability 1/12 equals 5/60, a multiple of 1/60.  So we can construct the selection function s(u → [(5)]), .).  Thus for any proposition t, the proposition [(u → [(5)]) → t]  will be well-defined and its probability will be the relevant conditional probability.  But there are other propositions in Set-Up 2 for which this can be done only by embedding this set-up in a still larger one.

As I said above, eventually we have to look upon the six possible outcomes of the die toss as slices of an evenly divided pie, this pie being infinitely divisible.  That is a comment about the theorem proved for models of the logic CE in which The Equation is satisfied. But as long as our examples, the ones that play a role in philosophical discussions of The Equation, are “small” enough, they will fit into small enough set-ups.

APPENDIX.

While leaving more details to the 1976 paper, I will here distinguish the set-ups, which are partial models, from the models.

I will now use “p”, “q” etc. with no connection to their use for specific propositions in the text above.

frame is a triple <V, F, P>, with V a non-empty set, F a field (Boolean algebra) of subsets of V, P a probability function on a field G of subsets of V, with F part of G.

model is a quintuple <V, F, P, s, →> such that:

  •  <V, F, P> is a frame 
  • (the selection function) is a function from FxV into the power set of V such that
  • s(p, x) has at most one member
  • if x is in p then s(p, x) = {x}
  • s(Λ, x) = Λ

→ is the binary function defined on FxF by the equation

(p → q)  = {x in V: s(p,x) ⊆ q}

Note: with this definition, <V, F, →>  is a proposition algebra, that is, a Boolean algebra with (full or partial) binary operation →, with the following properties (where defined):

(I)        (p → q) ∩ (p → r) =  [p → (q ∩ r)]

(II)       (p → q) ∪ (p → r) = [p → (q ∪ r)]

(III)     p ∩ (p →q)  =   p ∩ q

(IV)     (p → p) = V.

set-up or partial model  is a quintuple <V, F, P, s, →> defined exactly as for a model, except that is a partial function, defined only on a subset of FxV.  And accordingly, → is then a partial binary function on the propositions.

In the next post I will explore Set-Up 1 and Set-Up 2 further, with examples . 

NOTES 

I want to thank Branden Fitelson and Kurt Norlin for stimulating correspondence, which gave me the impulse to try to figure this out.  

REFERENCES

The theorem referred to above is on page 289 of my “Probabilities of Conditionals”, pp. 261-300 in W. Harper and C.A.Hooker (eds.), Foundations of Probability Theory, ….   Vol. 1. Reidel Pub; Dordrecht 1976.

(Note that this is not the part about Stalnaker-Bernoulli models, it is instead about the models defined on that page. There is no limit on the nesting of arrows.)

Alan Hajek, “Probabilities of Conditionals: Revisited”.  Journal of Philosophical Logic 18 (1989): 423-428.  (Theorem that the Equation has no finite models.)

Alan Hajek and Ned Hall, “The hypothesis of the conditional construal of conditional probability”.  pp. 75-111 in Probability and Conditionals: Belief Revision and Rational Decision.  Cambridge U Press 1994.