Blog Feed

Orthologic and epistemic modals

A brief reflection on a recent paper, “The orthologic of epistemic modals” by Wesley Holliday and Matthew Mandelkern 

  1. The motivating puzzle p. 1
  2. Inspiration from quantum logic p. 1
  3. Propositions and truth p.2
  4. Could quantum logic be the logic of natural discourse? p. 3
  5. Why this, so far, is not enough for epistemic modals p. 4
  6. Pure states and mixed states p. 4
  7. An open question p. 5

1.   The motivating puzzle

Here is a puzzle for you:

(Puzzle *) We, Able and Baker, A and B for short, are two propositions.  Baker does not imply the negation of Able.  Yet our conjunction is a self-contradiction.  Who are we?

In any first or even second year logic course the right answer will be “you do not exist at all!”  For if Baker does not imply the negation of Able then their conjunction could be true.

But the literature on epistemic modals furnishes examples, to wit:

“It is raining, but it might not be” cannot be true.  Yet, “it might not be raining” does not imply “It is not raining”.

Such examples do rest on assumptions that may be challenged – for example, the assumption that the quoted sentences must all be true or false.  But let that go.  The interesting question is how such a logical situation as depicted in (Puzzle *) could be represented.  

2.   Inspiration from quantum logic

That sort of situation was studied in quantum logic, with its geometric models, where the propositions are represented by the subspaces.  

A quantum mechanics model is built on a finite-dimensional or separable Hilbert space.  In quantum logic the special properties of the infinite-dimensional, separable space do not play a role till quite late in the game. What matters is mainly that there is a well-defined orthogonality relation on this space.  So it suffices, most of the time, to think just about a finite-dimensional Hilbert space (that is, a finite-dimensional inner product vector space, aka a Euclidean space).

 For illustration think just of the ordinary 3-space of high school geometry but presented as a vector space.  Draw the X, Y, Z axes as straight lines perpendicular to each other.  The origin is their intersection.  A vector is a straight line segment starting at the origin and ending at a point t, its tip; we identify this vector by its tip.  The null vector 0 is the one with zero length.  Vectors are orthogonal iff they are perpendicular, that is, the angle between them is a right angle.

In the diagram, the vectors drawn along the axes have tips (3, 0, 0), (0,5,0), and (0,0,2).  The vector with tip (3, 5, 2) is not orthogonal to any of those.

If A is any set of vectors, its orthocomplement ~A is the set of vectors that are orthogonal to every vector in A.  The subspaces are precisely the sets A such that A = ~~A.  In this diagram the subspaces are the straight lines through the origin, and the planes through the origin, and of course the whole space.  So the orthocomplement of the X-axis is the YZ plane.  The orthocomplement of the solid arrow, with tip (3, 5, 2) is thus a plane,  the one to which it is perpendicular.

About (Puzzle *).  Our imaginative, intuitive picture of a 3-space provides an immediate illustration to solve (Puzzle *).  In quantum logic, the propositions are the subspaces of a Hilbert space.  Just let A and B be two lines through the origin that are not orthogonal to each other.  Their conjunction (intersection) is {0}, the ‘impossible state’, the contradiction. But neither is in the other’s orthocomplement.  In that sense they are compatible.

3.   Propositions and truth

That the propositions are taken to be the subspaces has a rationale, introduced by von Neumann, back in the 1930s.  The vectors represent physical states.  Each subspace can be described as the set of states in which a certain quantity has a particular value with certainty.  (That means: if that quantity is measured in that state, the outcome is that value, with probability 1.)  

Von Neumann introduced the additional interpretation that this quantity has that value if and only if the outcome of a measurement will show that value with certainty.  This became orthodoxy: here truth coincides with relevant probability = 1. 

Given this gloss, we have: 

subspace A is true in (the state represented by) vector v if and only if v is in A.  

We note here that if vector u = kv  (in our illustration, that they lie on the same straight line through the origin) then they belong to all the same subspaces. As far as truth is concerned, they are distinct but indiscernible.  (For the textbook emphasis on unit vectors see note 1.)

Since the subspaces are the closed sets for the closure operation ~~ (S = the ortho complement of the orthocomplement of S), they form a complete lattice (note 2).   

The self-contradictory proposition contains only the null-vector 0 (standardly called the origin), the one with zero length, which we count as orthogonal to all other vectors.  Conjunction (meet) is represented by intersection.  

Disjunction (join) is special.  If X is a set of vectors, let [X] be the least subspace that contains X.    The join of subspaces S and S’, denoted (S ⊕ S’), is [S ∪ S’].  It is a theorem that [S ∪ ~S] is the whole space.  That means specifically that there is an orthonormal basis for the whole space which divides neatly into a basis for S and a basis for ~S.  Thus every vector is the sum of a vector in S and a vector in ~S (one of these can be 0 of course).

One consequence is of course that, in traditional terms, the Law of Excluded Middle holds, but the Law of Bivalence fails.  For v may be in A ⊕ B while not being either in A or in B.

The term “orthologic” refers to any logic which applies to a language in which the propositions form an an orthocomplemented lattice. So orthologic is a generalization of quantum logic.

4.    Could quantum logic be the logic of natural discourse?

The idea, once advanced by Hilary Putnam, that the logic of natural language is quantum logic, was never very welcome, if only because learning quantum logic seemed just too hefty a price to pay.  

But the price need not be so high if most of our discourse remains on the level or ‘ordinary’ empirical propositions.  We can model that realm of discourse by specifying a sufficiently large Boolean sublattice of the lattice of subspaces.

For a non-trivial orthocomplemented lattice, such as the lattice of subspaces of a Hilbert space, has clearly identifiable Boolean sublattices.  Suppose for example that the empirical situations that we can discern have only familiar classical logical relations.  That means that, in effect, all the statements we make are, precise or vague, attributions to mutually compatible quantities (equivalently, there is a single maximal observable Q such that all humanly discernible quantities are functions of Q).  

Then the logic of our ‘normal’ discourse, leaving aside such subtleties as epistemic modals,  is classical, even if it is, only a (presumably large) fragment of natural language.  For the corresponding sublattice is Boolean.

5.   Why this, so far, is not enough for epistemic modals

Quantum states are variously taken to be physical states or information states. The paper by Holliday and Mandelkern (henceforth H&M) deals with information, and instead of “states” they say “possibilities” (note 3).  Crucial to their theory is the relation of refinement:

x is a refinement of y exactly if, for all propositions A, if y is in A then x is in A.

I will use x, y, z for possibilities, which in our case will be quantum states ( those, we’ll see below, are not limited to vectors).

If we do take states to be to be vectors and propositions to be subspaces in a vector space, then the refinement relation is trivial.  For if u is in every subspace that contains t then it is in [t], the least subspace to which t belongs (intuitively the line through the origin on which t lies) and that would then be the least subspace to which u belongs as well.  So then refinement is the equivalence relation:  u and t belong to the same subspaces.  As far as what they represent, whether it is a physical state or an information state, there is no difference between them.  They are distinct but indiscernible.  Hence the refinement relation restricted to vectors is trivial.

But we can go a step further with Holliday and Mandelkern by turning to a slightly more advanced quantum mechanics formalism.       

6.   Pure states and mixed states

When quantum states are interpreted as information states, the uncertainty relations come into play, and maximal possible information is no longer classically complete information.  Vectors represent pure states, and thought of in terms of information they are maximal, they are as complete as can be.  But it is possible, and required (not only for just practical reasons), to work with less than maximal information.  Mixtures, or mixed states, can be used to represent the situation that a system is in one of a set of pure states, with different probabilities.  (Caution: though this is correct it is, as I’ll indicate below, not tenable as a general interpretation of mixed states.)

To explain what mixtures are we need to shift focus to projection operators.  For each subspace S other than {0} there is the projection operator P[S]: vector u is in S if and only if P[S]u = u, P[S]u = 0 if and only if u is in ~S. This operator ‘projects’ all vectors into S.

For the representation of pure states, the job of vector u is done equally well by the projection operator P[u], which we now also refer to as a pure state.  

Mixed states are represented by statistical operators (aka density matrices) which are, so to speak, weighted averages of mutually orthogonal pure states.  For example, if u and t are orthogonal vectors then W = (1/2)P[u] + (1/2)P[t] is a mixed state. 

 Intuitively we can think of W as being the case exactly if the real state is either u or t and we don’t  know which.  (But see below.)

W is a statistical operator (or density matrix) if and only if there are mutually orthogonal vectors u(i) (other than 0) such that W = Σb(i)P[u(i)] where the numbers b(i) are positive and sum to 1.  In other words, W is a convex combination of a set of projections along mutually orthogonal vectors.  We call the equation W = Σb(i)P[u(i)] an orthogonal decomposition of W.  

What about truth?  We need to extend that notion by the same criterion that was used for pure states, namely that the probability of a certain measurement outcome equals 1.  

What is certain in state W = (1/2)P[u] + (1/2)P[t] must be what is certain regardless of whether the actual pure state is u or t. So that should identify the subspaces which are true in W.

But now the geometric complexities return.  If u and t both lie in subspace S then so do all linear combinations of u and t.  So we should look rather to all the vectors v such that, if the relevant measurement probability is 1 in W then it is 1 in pure state v.  Happily those vectors form a subspace, the support of W.  If W = Σb(i)P[u(i)], then that is the subspace [{u(i)}]. This, as it happens, is also the image space of W, the least subspace that contains the range of W. (Note 4.)

It is clear then how the notion of truth generalizes:

            Subspace S is true in W exactly if the support of W is part of S

And we do have some redundancy again, because of the disappearance of any probabilities short of certainty, since truth is construed following von Neumann.  For every subspace is the support of some pure or mixed state, and for any mixed state that is not pure there are infinitely many mixed states with the same support.

While a pure state P[u] has no refinements but itself, if v is any vector in the support of W then P[v] is a refinement of W.  And in general, if W’ is a statistical operator whose support is part of W’s support, then W’ is a refinement of W.  

So we have here a non-trivial refinement relation.

Note: the geometric complexities.  I introduced mixed states in a way seen in text books, that for example W = (1/2)P[u] + (1/2)P[t] represents a situation in which the state is either u or t, with equal probabilities .  That is certainly one use (note 5).

But an ‘ignorance interpretation’ of mixtures in general is not tenable. The first reason is that orthogonal decomposition of a statistical operator is not unique.  If W = (1/2)P[u] + (1/2)P[t] and W = (1/2)P[v] + (1/2)P[w] then it would in general be self-contradictory to say that the state is really either u or t, and that it is also really v or w.  For nothing can be in two pure states at once.  Secondly, W has non-orthogonal decompositions as well.  And there is a third reason, having to do with interaction.  

All of this has to do with the non-classical aspects of quantum mechanics.  Well, good!  For if everything became classical at this point, we’d lose the solution to (Puzzle *).

7.   An open question

So, if we identify what Holliday and Mandelkern call possibilities as quantum states, we have ways to represent such situations as depicted in (Puzzle *), and we have a non-trivial refinement relation.

But there is much more to their theory.  It’s a real question, whether continuing with quantum-mechanical states we could find a model of their theory.  Hmmm …. 

NOTES

  1. In textbooks and in practice this redundancy is eliminated by the statement that pure states are represented by unit vectors (vectors of length 1).  In foundations it is more convenient to say that all vectors represent pure states, but multiples of a vector represent the same state.
  2. See e.g. page 49 in Birkhoff, Garrett (1948) Lattice Theory.  Second edition.  New York: American Mathematical Society.  For a more extensive discussion see the third edition of 1967, Chapter V section 7. 
  3. Holliday, W. and M. Mandelkern (2022)  “The orthologic of epistemic modals”.  https://arxiv.org/abs/2203.02872v3
  4. For the details about statistical operators used in this discussion see my Quantum Mechanics pages 160-162.
  5. See P. J. E. Peebles’ brief discussion of the Stern-Gerlach experiment, on page 240 of his textbook Quantum Mechanics, Princeton 1992.  Peebles is very careful, when he introduces mixed states starting on page 237 (well beyond what a first year course would get to, I imagine!) not to imply that an ignorance interpretation would be generally tenable.  But the section begins by pointing to cases of ignorance in order to motivate the introduction of mixtures:  “it is generally the case …[that] the state vector is not known: one can only say that the state vector is one of some statistical ensemble of possibilities.”

Conditionals and Bell’s Inequalities

NOTE: section 6 has recently been updated (July 2022)

1.  Introduction: An a priori proof of Bell’s Inequalities?

2. The logic CE and probability modeling

3. How the Bell Inequalities are tested

4. Setting up for the derivation

5. Modeling this with conditionals

6. An empiricist view of conditionals

1.  Introduction: An a priori proof of Bell’s Inequalities?

Bell’s Inequalities, which are satisfied in the probabilities of results, conditional on experimental set-ups of a traditional sort, are famously violated in the results of certain quantum mechanics experiments.  It is therefore remarkable that those inequalities could be deduced from certain putative principles about causality and locality.  It would seem that their violation could be taken as refuting those principles.

But even more remarkable were the signs that Bell’s Inequalities could be deduced logically, given certain principles seriously proposed for the logic of conditionals.  (See Bibliography: Stapp 1971, Eberhard 1977, Herbert and Karush 1978.) My project here is to examine that deduction, and the options there are for a philosophical response.  Is it the logic of conditionals that is at fault?  Or is it an understanding of conditionals and their logic that was loaded with philosophical realist presuppositions?

Bell’s Inequalities related conditional probabilities, of results given certain measurement set-ups.  Therefore, if they are to be approached in this way, it can only be in a logic compatible with a bridge principle that relates conditionals and conditional probabilities.  The bridge principle introduced by Robert Stalnaker, known generally as Stalnaker’s Thesis, or more recently as just The Thesis, was this:

Thesis.  P(A |B) = P(B |A)

Its combination with Stalnaker’s logic of conditionals turned out to be inadequate (it would allow only trivially small models).  David Lewis rejected the Thesis, but also pointed the finger at what he took to be a mistaken principle about conditionals

Conditional Excluded Middle (CEX)   [A –> (B v C)] is equivalent to [(A –>B) v (A –> C)]

The Thesis and CEX do indeed go together, for one needs to accommodate the fact about conditional probability that if B and C are contraries then P(B v C| A) = P(B |A ) + P(C | A).

(CEX is what Paul Teller called the Candy Bar Principle; see previous post about this.)

There is however a logic of conditionals, which I called CE, somewhat weaker than Stalnaker’s, which includes CEX yet combines successfully with the Thesis.  It has only infinite models, but ‘small’ situations, like experimental set-ups, can be modeled with partial structures that are demonstrably extendable to models of CE combined with the Thesis, for all sentences, regardless of complexity. (See previous posts on probabilities of conditionals.)

So, we are in a position to examine the arguments that putatively lead from the logic of conditionals to Bell’s Inequalities.

2. The logic CE and probability modeling

The first principle for any logic of conditionals is that if  A implies B then A–> B is true.  This yields at once the theorem that A –> A is always true. The second is Modus Ponens:  if A and A –> B are both true then B is true.  Beyond this, there be controversy.

The logic CE adds CEX, stated above, as well as two more principles about conjunction.  Stated as a theory of propositions (rather than the sentences that express them):

(I)  Conditional Excluded Middle       A –> (B v C)    =   (A –> B)  v   (A –> C)

(II) Conjunction Distribution             A –> (B & C)   =   (A –> B)  & (A –> C) 

(III) Modus Ponens Amplified            A & (A –> B)  =   (A & B)

The last includes Modus Ponens, but adds something:  if A and B are both true, then there is no question about whether B would be true if A were true, of course it would, because it is.

How are situations, like experimental set-ups, modeled?  The standard way of doing this in philosophical logic is to say that each proposition is either true or false in each possible world, and the proposition can be identified with (or at least, represented by) the set of worlds in which it is true.  Of course, ‘possible worlds’ is a metaphor, there is just the set-up and the possible results of doing the experiment.

As a simple example suppose a die is to be tossed.  We have a hypothesis: the die is fair, and all outcomes have the same probability.  So as possible worlds we just take the sequences <toss, outcome> of which there are six.  In <toss, 1> the statement ‘the outcome is 1’ is true, and so forth.

Now the conditionals we are interested in are such as these: 

            A –> B ‘if the outcome is odd, it is less than 4’. 

That is true in <toss, 1> and in <toss, 3>.  But where else is it true?   To satisfy Stalnaker’s Thesis, since the probability of ‘greater than four, conditional on even’ is 2/3, or 4/6. So the conditional must be true in two other worlds besides those.

So we have in the model a function s:  given non-empty antecedent A and world w, the world s(A,w) is a world in which A is true. Intuitively, s(A, w) is the way things would have been the case with A true.  The proposition A –> B is then the set of worlds {w: s(A, w) is in B}.

Elsewhere (see Notes at the end) you can see the details for this very simple experimental set up, modeled with conditionals and probabilities.  Just to give you the idea, there the proposition A –>B that we have here as example, is true in worlds <toss, 1>, <toss, 3>, <toss, 2>, <toss, 4>.  So, in this model, if the outcome is actually 4, then the outcome would have been less than 4 if it had been odd.  There is no rationale for this:  it is just how things are in that possible universe of worlds.  We might be living in it, or in another one, that is not up to us.

The important point is this:  for any such situation we can construct a representation that is extendible to a model of CE in which the Thesis holds for all propositions.

3. How the Bell Inequalities are tested

For the original Einstein-Podolsky-Rosen thought experiment David Bohm designed an experiment that would be technically feasible, in which a pair of photons are emitted from a source in an entangled state.

For us, what we need to display is only the ‘surface’ of the experimental set-up, with some notes to help the imagination; we do not need to look into the quantum mechanics.

Look at the left side, labeled (A) for ‘Alice’.  It is a device in which there is a polarization filter, which may or may not be passed by an incoming particle.  A red light goes on if it does pass, a green light if nothing passes.  That filter has three different orientations, and one is chosen beforehand by the experimenter, or by a randomizing device.  Similarly for the right hand side, labeled (B) for ‘Bob’.

The experimental facts are these: if we only look at one side, left or right, then regardless of the setting, the red light is going on in exactly 50% of the runs.  But if we look at both, we see that if the two settings are the same, then the red lights never turn on at the same time (Perfect (anti)Correlation.)  Furthermore, there are specific probabilities for the red lights turning on at the same time, for any pair of settings.  These are conditional probabilities:  P(red light for Alice and red light for Bob | Alice chose setting i and Bob chose setting j).   It is these for which the Bell Inequalities may or may not hold.

4. Setting up for the derivation

With reference to the above diagram let’s refer to Alice as the one on the left (L) and Bob as the one on the right (R) .  The two outcomes, red light on and green light on, I will refer to as outcomes 1 and 0.  And the settings are settings 1, 2, 3.  Let little letters like “a”. “b”, “i” and “j” be variables over {1,0}.  Then we can symbolize:

On the left the setting is i:    Li

On the left the setting is i and the outcome on the left is a:    Lia

On the right the setting is j and the outcome on the right is b:    Rjb

and then we can have sentences like (Lj1 –> Rj0) to indicate that with the same setting in on both sides, if the outcome is 1 on the left then it will be 0 on the right.  Also sentences like P(Lj0|Rk0) = 3/4 to indicate that if the left and right settings are j and k respectively, then the probability that the light is green on the left side, given that it is green on the right side, equals 3/4.

The conclusion drawn from many observations are the following two premises.

For i, j = 1, 2, 3 and a, b = 0, 1:

I. Perfect Correlation: If the setting is i on both sides then the probability of outcome a on both sides equals 0

II. Surface Locality: The probability of outcome Lia is the same conditional on Li as it is conditional on Li & Rj —

that is, the probability of an outcome on one side is unaffected by the setting on the other side.

Now the Bell Inequalities can be expressed in a simple way.  Let us abbreviate the special case of the probability of outcome 1 happening on both sides, for specific settings, as follows:

p(i; j)  = the probability of (Li1 & Rj1) given settings Li and Rj

The Bell Inequalities can then be expressed in a set of ‘triangle inequalities’:

p(1;2) + p(2;3)  ≥  p(1;3)

and so forth.  There is no reference in these inequalities to any factor which may be hidden from direct measurement — any violation can be found on the observable level.

So it would be very disconcerting if there were a proof that there could not be any violations!

5. Modeling this with conditionals

Naturally, some idealization will be involved.

The entry moves

We fix on some entailments implied in the experimental set-up, assuming the sort of perfection not found in an actual lab.  So we take it that the settings being chosen, and the experiment initiated, entails that there will be an outcome, and it will be red light or green light:

Li entails (Li1 v Li0), hence Li –> (Li1 v Li0) is true

Similarly for the other similar points; for example, (Li &Rj) –> [(Li1 & Rj1) v …. v (Li0 & Rj0)], all logically possible combinations listed in the blank here.  

Moreover, we take it as necessary, that is true in all worlds, that outcomes are unique.  That is, the conjunction (Li1 & Li0) is never true; similarly for R.

Finally, the modeling must accommodate this:  Lia and Rjb are each, taken individually, possible, that is, there is a possible world in which it is true. 

Consequences of the entry moves  

Starting with the entry move, and using Conditional Excluded Middle, we infer accordingly:

Either Li –> Li1 or Li –> Li0 is true,

more generally,

One of the conditionals (Li & Rj) –> (Lia & Rjb) is true, i, j = 1, 2, 3 and a, b = 1, 0.

Note that this is a finite set of conditionals, since there are only finitely many combinations of settings and outcomes.

I have just written “is true”, as if we are only interested in the actual world.  Of course we have to be interested in all the possible worlds in the modeling set-up, each characterized by specific settings and specific outcomes.  But the above reasoning holds for any world in the model. 

We note also that as long as an antecedent of a conditional is itself possible, there cannot be a conflict in the consequents:

            if  A –> B and A –> ~B are true, or if [A –> (B & ~B)]is true, then A is false.

This follows from the Conjunction principles, and applies to our case because with our entry moves Li1 implies the falsity of Li0, and so forth.

The hidden variable

Which counterfactual conditionals are true in a given situation is not something empirically accessible.  But by the above reasoning, in any given world α in the model, there is a set of true propositions of form Li –> Lia, Rj –>Rjb, (Li & Rj) –> (Lia & Rjb) which characterizes that world. 

Call that set A(α).  It is a hidden factor which completely determines what the outcomes will or would be, whatever setting the experimenter chooses or would have chosen if he had chosen differently. A(α) represents the world’s hidden dynamical state.

NOTE: At this point we have to raise a question not answerable in that simple artificial language: what makes those conditionals true? In order for the discussion to have any bite, with respect to the experiment, whatever makes them true must not be, for example, the actual but unknown future — it has to be something determined before the experiment has its outcomes. That set A(α) of statements has to represent something characterizing the particle-pair at the outset: that hidden dynamical state has to be a physical feature. To this we will come back below.

(Historically minded logicians will be reminded here of the objections to Diodoros Chronos’ Master Argument.)

Importing Perfect Correlation

This is a Surface principle that must govern the modeling.  It can be extrapolated, because there is nothing you could add to the antecedent that would raise a probability zero to a positive probability.  So we conclude

For i = 1, 2, 3, P(Li1 & Ri1| Li & Ri  & X) = 0, and so [(Li & Ri & X) –> (Li1 & Ri1)] is false in all worlds, regardless of what X is, unless (Li & Ri & X) is itself impossible.

Consider now world α and let X be its hidden state A(α).  Suppose Li and Ri are both true.  In that case, since A(α) is also true, it follows that Li1 and Ri1 are not both true.  So Li –> Li1 is in A(α) if and only if Ri –> Ri1 is not in A(α), which entails that Ri –> Ri0 is in A(α).

Thus for each i = 1, 2, 3 we need only know whether Li –>Li1 is in A(α), we need not add a conditional with antecedent Ri.  So really, all that matters in A(α) are three conditionals:  L1 –>L1a, L2 –>L2b, L3 –>L3c.  And it is the triple <a, b, c> that summarizes what matters about A(α), a triple of numbers each of which is 0 or 1.  In some worlds the hidden state is thus summarized by <1,0,1>, in others by <0, 1, 1>, and do forth.  Let’s introduce a name:  

            Cabc = the set of worlds β such that the hidden state of β is summarized by <a, b, c>

That set of worlds has a probability, P(Cabc).

Suppose now that we have chosen settings L1 and R2 in world α.  What is the probability that the red light will turn on, on both sides — i.e. the probability of L11 & R21?  It is the probability that A(α) is either of type <1, 0, 1> or type <1, 0, 0>, hence P(C101)+P(C100).

So that sum equals the conditional probability p(1; 2) = P(L11 & R21 | L1 & R2).  Similarly for the other terms in Bell’s Inequalities. Now we can see whether they follow from what we have arrived at so far:

p(1; 2) = P(C101) + P(C100)

p(2; 3) = P(C110) + P(C010)

p(1; 3) = P(C110) + P(C100)

and so we calculate:

p(1; 2) + p(2; 3)          = P(C101) + P(C100) + P(C110) + P(C010)

                                    = P(C101) + p(1; 3) + P(C010)

                                    ≥  p(1; 3)

as required.

Similarly for the other triangle inequalities that make up Bell’s Inequalities.

Thus, we have deduced the Bell’s Inequalities, and this implies that in the experimental set-up we predict that those inequalities will not be violated. But in certain set-ups of this form, they are violated.

Our task: to show just what is wrong with the above deduction, what hidden assumptions it must have that are disguised by our traditional ways of thinking about counterfactual conditionals or even more hidden assumptions underlying those.

6. An empiricist view of conditionals

Faced with the above result, and given the attested phenomena that violate Bell’s Inequalities, the first temptation is surely to conclude that the logic of conditionals is at fault, with the main suspects being CEX and or Stalnaker’s Thesis.

The problem with this is that if we reject CEX or Stalnaker’s Thesis, we no longer have any way to relate conditionals to Bell’s Inequalities, which deal with conditional probabilities.  So the conversation ends there.

I propose that the fault lies rather in the philosophical background, with realism about conditionals.  That is a metaphysical position, even if it mimics common sense discourse oblivious to its own presuppositions.  On such a realist view, conditionals, even when counterfactual, are factually true or false.  On that view, what I called the hidden state is real, an aspect of the objective modalities in nature. 

The are different options for empiricist/nominalist views about conditionals. Elsewhere I have explored a switch from semantics to pragmatics, by moving the focus from truth conditions to felicity conditions for assertion.

But here I will explain an alternative that seems pertinent for the present topic, the analysis of the sort of reasoning that surrounds the Einstein-Podolsky-Rosen paradox and the violation of Bell’s Inequalities. (I first suggested this in in a Zoom lecture in March 2022 to a German student organizaton, and have since then worked out the technical details in the post called A Rudimentary Approach to the True, the False, and the Probable.)

Let us start from the core description of the experiment (or any experiment or situation of this sort) which involves the assertions about the actual settings and the conditional probabilities of outcomes or different settings.  As far as the exposition of the phenomena is concerned, that suffices.  All relevant information for the discussion of Bell’s Inequalities’ violation by certain phenomena can be expressed here.

Now suppose that into this language we introduce the arrow, the ‘conditional’ propositional operator, with Stalnaker’s Thesis as the basic principle governing its meaning:

P(B | A) =   P(A –> B) 

Extending the language cannot by itself create new information, let alone new facts!  So we should insist that the right-hand part of the equation contains no more information about the experiment than the left-hand part does.

A realist interpretation denies this, in part at least:  it insists that we, the observers, have no more information than what is there on the left, but this merely a limitation to our knowledge.  In fact, those conditionals are divided into the true and the false, in accordance with facts not describable in the original language.  

What alternative can we offer?  We can submit that the conditional A –> B is true only if P(B | A) = 1 and false only if P(B | A) = 0.  In that case there is nothing more to be known about the truth values of conditionals beyond what we can gather from the probabilities.

It follows of course that many of these conditionals are neither true nor false.  Indeed, in the original set-up, one of the remarkable facts is that we know that P(L11|L1) = 0.5, and so (L1 –> L11) is not true, and not false.  The true conditionals, do not tell us what will definitely happen, though they tell us something about what will definitely not happen.   An example is [(L1 & R1 &R10) –>  L11], because P(L11| L1 & R10) = 1.

Specifically, the Candy Bar Principle is correct in one sense and not in another. In general, A –> (B or ~B) is true, that is part of CEX, and the fact that P(B or ~B | A) = 1. But still in general, neither P(B|A) nor P(~B|A) will have probability 1, so neither A –> B nor A –> ~B will be true. Conditional Excluded Middle is valid, but the Principle of Bivalence fails. (And this is a familiar phenomenon in various places in philosophical logic.)

My suggestion is therefore that factual statements about preparation, measurement, and outcomes are true or false always, while the subjunctive conditionals about them are true or false only when the conditional probabilities are 0 or 1. This does not make much sense in the usual approaches to semantic analysis of modals, which are realist at least in form. How could there be such a difference between the evaluation of conditionals and the evaluation of statements lacking any such modal connectives? 

It can be done in the way I’ve presented in the post “A Rudimentary Algebraic Approach to the True, the False, and the Probable”.

NOTES

A bit of history. In a seminar in 1981, after teaching about Bell’s Inequalities, I handed out a small addition called “The End of the Stalnaker Conditional?” It contained a sketch of the argument I presented here. This paper was widely distributed though never published.  John Halpin (1986), who cites this paper, elaborated on it with reference to the defense and modifications Stalnaker offered in his (1981), to show that Bell’s Inequalities would still be derivable after that defense.  Both Halpin and I were addressing Stalnaker’s logic, which is stronger than CE, and could not be successfully combined with Stalnaker’s Thesis.  So the point was really moot, unless the argument concerning Bell’s Inequalities could be shown to use no resources going beyond CE.

The logic CE and its combination with Stalnaker’s Thesis, with proofs of its adequacy, were presented in my (1976).  Essentially the same theory was developed by  I. R. Goodman and H. T. Nguyen, independently; for a quick look see the Wikipedia articles “Conditional Event Algebra” and “Goodman-Nguyen-van Fraassen algebra”.   The ideas of my 1975 were also developed further by Stefan Kaufman in a series of more recent papers; see especially Kaufman (2009), and still more recently by Goldstein and Santorio (2021).

Above I wrote about CE, combined with the Thesis, that it has only infinite models. But ‘small’ situations, like experimental set-ups, can be modeled with partial structures that are demonstrably extendable to models of CE combined with the Thesis, for all sentences. Details about this can be found in my previous posts in this blog, “Probabilities of Conditionals” (1) and (2). The examples with die tosses, and their details, can be found there.

BIBLIOGRAPHY

Eberhard, P. H. (1977) “Bell’s Theorem without hidden variables”. Il Nuovo Cimento 38B(1): 75-79.

Goldstein, S. and P. Santorio (2021) “Probability for Epistemic Modalities”. Philosophers’ Imprint 21 (33): 1-34.

Halpin, J. (1986) “Stalnaker’s Conditional and Bell’s Problem”.  Synthese 69: 325-340.

Herbert, N. and J. Karush (1978) “Generalizations of Bell’s Inequalities”. Foundations of Physics 8: 313-317.

Kaufmann, S.  (2009) “Conditionals Right and Left: Probabilities for the Whole Family”. Journal of  Philosophical Logic 38: 381-353.

Stalnaker, R.  (1981)  “A Defense of Conditional Excluded Middle”.   Pages 87-104 in Harper, Stalnaker, and Pearce (eds.) Ifs. Dordrecht: Reidel.

Stapp, Henry P. (1971) “S-matrix interpretation of quantum theory”. Physical Review D3: 1303-1320.

van Fraassen, B. C. (1976)  “Probabilities of Conditionals”.  Pages 261-308 in W. Harper and C.A. Hooker (eds.) Foundations of Probability and Statistics, Volume l.  Dordrecht: Reidel.

A Humble Little Probability Theorem

If it is the case that 

P(A) = 1 implies P(B) = y

then P(B | A) = y.

Is this a valid inference?  Or better:  under what conditions, if any, is this a valid inference?

It may well seem a natural, intuitive inference in certain cases.  For example, if I am certain that the coin is fair then I am certain that the probability of Heads in a fair coin toss is 0.5.  If instead I am certain that the coin is biased 3:1 in favor of Heads than I am certain that the probability of Heads in a fair coin toss is 0.75.  And in both cases, my conditional probability for a Heads outcome of a fair toss is the corresponding probability, 0.5 in the one case and 0.75 in the other.

But that is one sort of example, and not all examples are so simple. 

First I will show a very general form of modeling probabilistic situations in which this inference is indeed valid.  

Secondly I will show, with reference to Miller’s Principle for objective chance and the Reflection Principle for subjective probability, that there are important forms of modeling probabilistic situations in which the inference is not valid at all.  

And thereby hangs a tale.

One.  Simple chance or simple factual opinion

We can think about the coin tossing example in either of two ways.  The first is to equate the probabilities with objective chances, resulting from the structure of the coin and of the coin tossing mechanism.  The second is to equate the probabilities with the subjective probabilities of a person who has certain odds for the coin toss outcomes.  In both cases the set of probability functions that can represent the situation is simply all those that can be defined on the space {Heads, Tails}.  

That set, call it PR,  has a feature which could remain in similar models of more complex or more sophisticated forms:  PR is closed under conditionalization.  That is, if P is in PR, and A in the domain of P, then P( . | A) is also in PR.

Assumption I:  the set of all probability functions that could represent the probabilities of the propositions in a certain possibility space = <S, F> is closed under conditionalization.

Explanation:  S is the set of possible states of affairs, F is a field of subsets of S (including S) — the members of F we call propositions.  A model satisfying the Assumption is a couple M = <PR> where PR is a set of probability functions defined on S, which is closed under conditionalization (where defined).

Theorem 1.  If it is the case that for all P in PR that if P(A) = 1 then P(B) = y, then it is the case that for all P in PR  that P(B | A) = y when defined.

Proof.  Suppose that for all P in PR that if P(A) = 1 then P(B) = y.

Suppose per absurdum that for some member P’ of PR it is the case that P’(B |A) = z, and it is not the case that z = y.  

This implies that P’(A) > 0.  Let Q = P’(. |A), the conditionalization of  P’ on A.

Then Q(A) = 1 and Q(B) = z.  So there is a member of PR, namely Q, such that Q(A) = 1 and it is not the case that Q(B) = y.  

Two.  Enter higher order probabilities

It may be tempting to think that the theorems for probability when higher order probabilities are not admitted all remain valid when we extend the theory to higher order probabilities. Here we have a test case.

Sometimes one and the same formula plays a role in the modeling of very different situations, and sometimes a formula’s status in various roles ranges from audacity to triviality, from truism to absurdity.  All of that happens to be the case with the formula

            (*)  P(A | pr(A) = x) = x

first appearing as Miller’s Principle (connecting logical and statistical probability, now usually read as connecting measure of ignorance P with objective chance pr) and later as the Reflection Principle (connecting present opinion about facts with present opinion about possible future opinions about those facts).  Both principles have a very mixed history (see Notes)

To model probabilities connected with probabilities of those probabilities we will not assume Assumption I (indeed, will show it running into trouble) but rather principle (*), which I will refer to by the name of its second role, the Reflection Principle.

Assumption II.  There is a function pr which maps S into PR.  For each number r and member A of F we define [pr(A) = r] = { x in S: pr(x)(A) = r}.  For all A and r, [pr(A) = r] is a member of F.

(For most numbers, perhaps even for all but a finite set of numbers,  [pr(A) = r] will be the empty set.)  

(This looks audacious, but it is just how Haim Gaifman sets it up.)  

The Reflection Principle is satisfied exactly if for all P in PR, and all A in F and all numbers r, 

    P(A | pr(A) = r) = r when defined

Theorem 2.  If the Reflection Principle is satisfied then Theorem 1 does not hold for PR.

Proof.   Suppose P(A) = 1.  The Reflection Principle implies that P(A | pr(A) = 0.5) = 0.5 if defined, that is, if P(pr(A) = 0.5)>0. 

But given that P(A) = 1, P(A | pr(A) = 0.5) = 1 also.  

Therefore, if P(A) = 1 then P(pr(A) = 0.5) = 0.

So, with B = [pr(A) = 0.5] we see that for all P in PR,

     if P(A) = 1 then P(B) = 0.

However, it is not always the case that for all P in PR, P(pr(A) = 0.5) | A) = 0.

This last point I will not prove.  Think back to the example:  the probability that the chance of getting outcome Heads = 0.5, given that the actual outcome will be Heads, is certainly not zero.  For the actual outcome does not determine the chance that outcome had of occurring.  Similarly, if I am ignorant of the outcome, then my personal probability for that outcome is independent of what that outcome actually is.

Corollary. If Assumption II holds and the Reflection Principle is satisfied, then the appropriate set PR is not closed under conditionalization.

Of course that corollary was something already known, from the probabilist version of Moore’s Paradox.

NOTES

[1]  This is a recap, in more instructive and more general form of three preceding posts:  “Moore’s paradox”, “Moore’s Paradox and Subjective Probability”, and “A brief note on the logic of subjective probability”.

[2]  I take for granted the concept of a probability function P defined on F.  As to conditional probability, P(B | A) this is a binary partial function defined by P(B | A) = P(B ∩ A)/P(A), provided P(A) > 0.

[3] David Miller introduced what came to be called Miller’s Principle in 1966, and produced a paradox. Dick Jeffrey pointed out, in effect, that this came by means of a modal fallacy  (fallacy of replacing a name by a definite description in a modal context).  Karl Popper, Miller’s teacher, compounded the fallacy.  But there was nothing wrong with the principle as such, and it was adapted, for example, by David Lewis in his theory of subjective probability and objective chance.

[4] When I say that the Reflection Principle too has a mixed history I am referring to fallacies by its critics.

BIBLIOGRAPHY

 Jeffrey, Richard C. (1970)   Review of eight discussion notes.  Journal of Symbolic Logic 35, 124–127.

Miller, David  (1966)  A paradox of information. The British Journal for the Philosophy of Science, vol. 17,  no. 1, 59-61. 

Consistency of the Reflection Principle for Subjective Probability

A recent article by Cieslinski, Horsten, and Leitgeb, “Axioms for Typefree Subjective Probability” ends with a proof that the Reflection Principle cannot be consistently added to the axiomatic untyped probability theory which they present. 

On the other hand, Haim Gaifman’s “A Theory of Higher Order Probabilities” can be read, despite the glaring difference in interpretation, as establishing the consistency of the Reflection Principle.  

Gaifman’s theory is not untyped, and Gaifman’s approach is not axiomatic but model-theoretic. Thus it stays much closer to the original, informal presentation of the Reflection Principle.  But it is still noticeably abstract.  We can think of his models roughly like this:  certain sets of possible worlds are propositions, and there is a function pr which serves to select those propositions that can express factual statements of form “My (or, the agent’s) probability for A equals r”.

What I would like to do here is present a similar theory, staying in closer touch with the original presentation of the Reflection Principle, and entirely explicit about the way the opinion I currently express (about A, say) is constrained to harmonize with my opinions about how that opinion (about A) could change in time to come.

Introduction

The Reflection Principle purports to be an additional criterion of synchronic coherence: it relates current opinion to other current opinions.  The principle has both a general form (the General Reflection Principle), but also a form specifically for agents who have opinions about their own (current and/or future) doxastic states. The latter was the original formulation, but should now properly be called the Special Reflection Principle.  I will formulate both forms precisely below.

Satisfying Reflection does not require any relation between one’s actual opinions over time.  Nevertheless it is pertinent also for diachronic coherence, because it is a constraint on the agent’s current expectation of her future opinions, and because a policy for managing one’s opinion must preserve synchronic coherence.  

So a minimal probability model, of an agent whose opinion satisfies Reflection, will consist of a probability function P with a domain that includes this sort of proposition:

(Q)   A & my opinion at (current  or future) time t is that the probability of A equals r.

I symbolize the second conjunct as pt(A) = r.  Hence, symbolically,

            (Q) A & pt(A) = r.

Statement pt(A) = r is a statement of fact, true or false, about the agent’s doxastic state at time t.  The agent can express opinions about this, as about any other facts.  

In contrast I  will use capital P to stand for the probability function that encodes the agent’s opinion.  This is the opinion that she expresses or would express with statements like “It seems twice as likely as not (to me) that it will snow tonight”.  So the sentence P(A) = r is one the agent uses to express such an opinion, and she does this in first-person language.  

The (special) Reflection Principle implies a constraint on the opinion expressed in form P(A & pt(A) = r), which relates the opinion expressed about A to the factual statement that the agent has that opinion. 

There is in the corresponding language no nesting: nothing of form P( … P …).  Whenever the agent expresses an opinion, it is an opinion about matters of fact.

We can proceed in two stages.  The first is just to see what the more modest General Reflection Principle is, and how it is to be satisfied. Then we can build on that to do the same for the Special Reflection Principle.  I will focus on modeling, and — except at one point — just take it that the relation to a corresponding language will be sufficiently clear.

Stage 1: General Reflection

My current probability for A must lie within the range spanned by the probabilities for A that I may have or come to have at any time t (present or future), as far as my present opinion is concerned.

To illustrate:  I am a weather forecaster and realize that, depending on whether a certain storm front moves in during the night, my forecast tomorrow morning will be either 0.2 or 0.8 chance of rain.  Then my present forecast for rain must be a chance x of rain tomorrow with x a number in the open interval (0.2, 0.8).

basic model  to represent an agent who satisfies the General Reflection Principle will be the quadruple M = < S, F, TPROB, Pin>, with its elements specified as follows.

T, the set of times,  is a linearly ordered finite or countable set with first member (the times).  For each t in T, TPROB(t) is a finite set of probability functions.  These are functions defined on a field F of sets in space S, with F having S itself as a member.  The members of F represent propositions about which, at any time t, I have an opinion, and the members of TPROB(t) are the opinions I could have at time t. 

= <S, F> I will call the basic space.  I will use A, B, … for members of F, which I will also call the elementary propositions.  The set of probability functions defined on the space  = <S, F> I will call Sp.

At the initial time the agent expresses an opinion, which for now I designate as Pin, consisting in probabilities both for the events represented in space S and about how likely she is to have at time t the various opinions represented in TPROB(t).

The General Reflection Principle requires that for all A in F, Pin(A) is within the span (convex closure, convex hull) of the set {p(A): p is in TPROB(t)}. I will designate that convex closure as [TPROB(t)].  The members of TPROB(t) are the vertices of [TPROB(t)].

Since Pin assigns probabilities to the members of TPROB(t) which are defined on the domain of Pin itself.  General Reflection then implies that Pin is a mixture (convex combination) of those members, with the weights thus assigned:

Pin(A) = ∑ {Pin(p)p(A): p in TPROB(t)}

Equivalently, <S, F, Pin > is a probability space, and as it happens, for each t in T, there are appropriate weights such that Pin is a convex combination of the members of TPROB(t).  

Pin cannot be more than one thing, so those convex combinations must produce, for each time t, the same initial opinion.  We can ensure that this is possible by requiring that for all t and t’,  [TPROB(t’)] = [TPROB(t)].  Of course these sets TPROB(t) can be quite different for different times t; the vertices are different, my opinions are allowed to change.  And specifically, I will later on have some new certainties, for example after seeing the result of an experiment.  What this constraint on the span of foreseen possibilities about my opinion implies for certainties is this:  

if today I am not certain whether A,  then, if I foresee a possibility that I will become certain that A at a later time, then I foresee also a possibility that I will become certain of the opposite at that time.

SUMMARY:  In this construction so far we have Pin defined on a large family of distinct sets, namely the field F of elementary propositions, and each of the sets TPROB(t), for t in T.  

The construction guarantees that Pin, in basic model M = <S, F, TPROB, Pin> satisfies the General Reflection principle.  

But we have not arrived yet at anything like (Q), and we have not yet given any sense to ‘pt(A) = r’.  This we must do before we can arrive at a form in which the Special Reflection Principle is properly modeled.

Stage 2: Special Reflection

The function Pin cannot do all that we want from it, for we need to represent opinions that relate the agent’s probabilities for events in space S to the probabilities assigned to those events by opinions that the agent may have at various (other) times.

Intuitively, (pt(A) = r) is the case exactly if the ‘actual’ opinion at time t is represented by a function p in TPROB(t) such that p(A) = r.  In general there may either no, or one, or many members of TPROB(t) which assign probability r to A.  

So the proposition in question is thus:

(pt(A) = r)  =   {p in TPROB(t): p(A) = r}

Since Pin is defined for each p in TPROB(t), Pin assigns a probability to this proposition:

            Pin(pt(A) = r)  = ∑{Pin(p): p(A) = r and p is in TPROB(t)}.  

But what is not well-defined at this point is a probability for the conjunction (*), mentioned above,  since A is a member of field F and (pt(A) = r) is a member of a quite different field, of subsets of TPROB(t). 

We must depart from the minimalist construction in the preceding section, and extend the function Pin  to construct a function P which is well-defined, for each time t, on a larger space.  This process is what Dick Jeffrey called Superconditioning. 

I have explained its relevant form in the preceding post, with an illustration and intuitive commentary.  So I will here proceed a bit more formally than in the preceding post and without much intuitive explanation.  

NOTE.  At this point we should be a bit more explicit about how the model relates to a corresponding language.  Suppose L is a language of sentential logic, and is interpreted in the obvious way in model M:  the semantic value [[Q]] of a sentence Q in L is an elementary proposition, that is, a subset of S, a member of field F.  

As we now build a larger model, call it M*, by Superconditioning, I need to have a notion of something in M* being ‘the same proposition’ as a given elementary proposition in M.  I will use the * notation to do that:  there will be relation * between M and M* such that a sentence Q which has value [[Q]] in M  has semantic value [[Q]]* in M*.  

Quick overview of the final model, restricted to a specific time t:  

Given: the basic model defined above, to which we refer in the description of final model M*.  

M*(t) = <S*, F*, TPROB*(t), P>, with

S* = S x TPROB(t)

If A is in F then A* = {<x, p>:  x is in  A, p is in TPROB(t)}, 

equivalently, A* = A x TPROB(t)

F* is a field of subsets of S* which includes {A*: A is in  F}

TPROB*(t) and P, defined on F*, are such that for all A in F, P(A*) = Pin(A)

Construction of the final model, for specific time t

 We focus on a specific time t, but the procedure is the same for each t in T.  Let TROB(t) = {p1, …, pn}.    Each of these probability functions is defined on the space S.  

But now we will think instead about the combination of each of these probability functions with as a separate entity.

For each j, from 1 to n, there is a set Sj = {<x, pj>}: x in S}.  Equivalently, Sj =  S x {pj}

We define:  

            for A in F, A= {<x, pj>:  x is in A},  

            the field Fj = {Aj : A is in F}.  

Clearly  Sj = <Sj, Fj> is an isomorphic copy of S, disjoint from Sk unless j = k..

            S* = <S*, F*> is the sample space with S* = ∪{S: j = 1, …, n}.  

Equivalently, S* = S x TPROB(t)

            F* is the least field of subsets of S* that includes S* and includes ∪{Fj: j = 1, …, n}.  

The sets Sj therefore belong to F* and are the cells in a partition of S*.  (These cells represent the distinct situations associated with the different probability functions pj, j = 1, …, n.)

Equivalently, F* is the closure of ∪{Fj: j = 1, …, n} under finite union.  This is automatically closed under finite intersection, since each field Fj is closed under intersection, and these fields are disjoint.  F* has S* as a member, because S* is the union of all the cells.  And the infimumof F*  is Λ because that is a member of each cell; note also that  Λx TPROB(t) is just  Λ.

Clearly, all members of F* are unions of subsets of those cells, specifically finite unions of sets Asuch that A is in F, for  certain numbers k between 1 and n, inclusive.

For A in F, we define A* = ∪{A:  j = 1, …, n}.  Clearly, A* = {<x, p>: x in A, p in TPROB(t)}

 The function f: A –> A* is a set isomorphism between F and F*.  For example,

A* ∩ B*    = [∪{A:  j = 1, …, n}] ∩ [ ∪{B:  j = 1, …, n}]  

                  = ∪ { Aj ∩ B:  j = 1, …, n}] 

                  =  (A ∩ B)*

Now we come to the probabilities.

Definition.   pj* is a probability function on Sj defined by pj*(Aj) = pj(A) for each proposition A in S.  

            TPROB* = { pj*| j = 1, …, n}

Looking back once again to our basic model we recall that there are positive numbers bj for j = 1, …, n, summing to 1 such that Pin = ∑{bjpj: j = 1, …, n}.  

We use these same numbers to define a probability function P on sample space S* as follows:

            For j = 1, …,n

  1. P(Sj) = bj
  2. for each A in F, P(Aj|Sj) = pj*(Aj).  
    1. Equivalently, for each A in F, P(A* ∩ Sj) = P(Aj) = P(Sj)pj*(Aj).  
  • P is additive: if A and B are disjoint members of F* then P(A ∪ B) = P(A) + P(B)

Since all members of F* are finite unions of members of the cells Sj, j = 1, …, n it follows that P is defined by this on all members of F*

It is clear that 3. does not conflict with 2. since pj* is additive.  Since the weights bj are positive and sum to 1, and each function pj* is a probability function which assigns 1 to Sj it follows that P is a probability function with domain F*, and is the appropriate convex combination of the functions pj*.

P(A*) = ∑ {P(A* ∩ Sj): j = 1, …, n} 

= ∑{P(Aj): j = 1, .., n}

= ∑bjpj*(Aj)

= ∑bjpj(A)

= Pin(A)

About the Special Reflection Principle

Define:

(pt(A) = r) = ∪{Sj : P(A*|Sj) = r}

Equivalently,

(pt(A) = r) = ∪{Sj : pj*(Aj) = r}

Since TPROB*(t) is finite, we can switch to a list:

(pt(A) = r)  =    ∪{Sj : j = k, .., m}

            P(pt(A) = r)  =   ∑ P{Sj : j = k, .., m} = ∑ {bj: j = k, …,m}

With this in hand we now calculate the probability of the conjunction (Q)

A* ∩ (pt(A) = r)  =  A* ∩ ∪{Sj : j = k, .., m}

                                = ∪{A ∩ Sj : j = k, .., m}

                                = ∪{Aj : j = k, .., m}

            P(A* ∩ pt(A) = r)  =  ∑{ P(Aj ): j = k, .., m}

                                              = ∑ {P(Sj)pj*(Aj): j = k, .., m}

                                            =  ∑{bjpj*( Aj): j = k, .., m}

                                          =  r∑{bj: j = k, .., m}  

because for each j = k, …, m, pj*( Aj) = r.

Given both these results, and the definition of conditional probability, we arrive at:

            P(A* | pt(A) = r) = r, if defined, that is, if P(pt(A) = r) > 0.

the Special Reflection Principle.

NOTES

1]  The same formalism can have many uses and interpretations — just like, in physics, the same equation can represent many different processes.  Of course, here “the equation” refers just to the mathematical form, with no reference to meaning or interpretation.

In that sense the Reflection Principle appeared first (as far as I can remember) as Miller’s Principle, connecting subjective probability with objective chance, and used in that sense by David Lewis in his theory thereof.  

Then Haim Gaifman, who uses the notation and pr,  gave Miller’s Principle the interpretation that the person expressing her opinion P takes pr to be the opinion of someone(s)  or something(s) recognized as expert(s), to which she defers.  I have drawn on Gaifman’s theory with that interpretation elsewhere, to give a sense to acceptance of a scientific theory. 

2] But the possibility of this sort of reading, which I had mentioned in “Belief and the Will” only to dismiss it for the issue at hand, did promote a misreading of the Reflection Principle.  (As David Christensen did, for example.)  It would clearly be irrational for me to defer to my future opinion except while supposing that I will then be both of sound mind and more knowledgeable than I am now.  But it is not irrational even now to expect myself to be both of sound mind and more knowledgeable, as a result of the sort of good management of my opinion over time, on the basis that I am committed to do so.  And this, all the while knowing that I may either be interrupted in this management by events beyond my control or by interrupting myself, in the course of gaining new insights.   

This is exactly of a piece with the fact that I can morally promise, for example, to protect someone, and expect myself to keep my promise, and morally expect others to rely on my promise, while knowing — as we all do —  the general and irremediable fact that, due to circumstances presently unpredictable, I may fail to do so, either because of force majeure or because of overriding moral concerns.  In epistemology must strive for the same subtlety as in ethics.

3] See previous post, “Conditionalizing on a combination of probabilities” for Jeffrey’s concept of Superconditioning and its relation to the informal Reflection Principle.

REFERENCES

Cieslinski, Cezary,  Leon Horsten, and Hannes Leitgeb (2022) “Axioms for Typefree Subjective Probability”.  arXiv:2203.04879v1

Gaifman, Haim (1988)  “A Theory of Higher Probabilities”.  Pages 191-219 in Brian Skyrms and William L. Harper (eds.) Causation, Chance and Credence.  Dordrecht: Kluwer, 1988.

Van Fraassen, Bas C. (1995)  “Belief and the Problem of Ulysses and the Sirens.”  Philosophical Studies 77: 7–37.

Conditionalizing on a combination of probabilities

NOTE:   This is a going to be about what Dick Jeffrey called Superconditioning

But here applied to a topic of my own concern.

I like the physicists’ term “mixture” but the more traditional terminology is this:

Probability function P is a convex combination of probability functions p and q if and only if there are positive numbers a and b (the weights)in the interval [0,1] such that P = ap +bq.

This implies of course that a + b = 1.  The definition is easily extended to a cover a convex combination of any finite number of probability functions.

The following example is so simple that it may seem to harbor no complexities at all.  But it does.

EXAMPLE.  We are going choose at random one of three coins to be tossed: one is fair, one is biased for Heads 4:1 and one is biased for Heads 1:4.  My probability P is the convex combination with weights 1/3 of each of the functions p, q, r proper to the three coins:

            P(Heads) =  (1/3)(0.5) + (1/3)(0.8) + (1/3)(0.2) = 0.5

QUESTION.  What happens if I am told that the coin actually to be tossed is the one heavily biased in favor of Heads?  Obvious! I conditionalize on this new evidence.  I change my subjective probability to P(Heads) = 0.8, or if you like, I change it from P to q.

And what if instead I am just told it is either the fair coin or the one heavily biased in favor of Heads?  Obvious!  I conditionalize on this new evidence.  I change from P to (1/2)(p + q) and my new probability for Heads is 0.65.  Very natural.

But this makes no sense at all.  All four probability functions P, p, q, r have the same domain, namely the very little sample space {Heads, Tails}.  There is nothing in that domain that could be a proposition expressed by “the coin to be tossed is  …”.  These functions cannot be conditionalized on something that is not in their domain of definition.

To put it metaphorically: those three functions p, q, r pertain to three possible worlds, which are very different from each other.  They are all possible as far as my initial information is concerned.  Just imagine, in one of them the number of galaxies is most likely one, with our planet as the sole one inhabited, and in the others there are most likely billions of galaxies teeming with intelligent life …. 

We naturally think of this situation as consisting of a single set of possibilities, with three probability functions defined on it.  But really, we have three sets of possibilities, entirely alike in themselves, but each with its single, unique, different probability distribution.

So to represent what is happening here we have to switch to a larger sample space: that is what Jeffrey calls Superconditioning.  It is on that larger space on which P is defined, and in that space the functions p, q, r mark a partition.  If C(p) is the cell in this partition that is marked p then the probability of Heads in that cell equals 1/2.  And what P initially assigns probability 1/3 to, is that cell.

It is customary to keep using the same words and symbols but of course that is a practice designed to confuse.  No proposition in the small space {H, T} is a proposition in the larger space on which, we now say, our subjective probability is really defined. 

I will show the proper representation, formally constructed, in an Appendix.  But for now, let us go back to the example, and look at the informal, not yet dis-confused, question of whether the following is the case:

The probability that Heads comes up, given that the coin is so biased that the probability of Heads is 0.8, equals 0.8

or more generally, symbolically,

P(A | µ(A) = x) = x 

What is meant by this?  The proposition [µ(A) = x] must be the set, on which P is defined, which consists of just those cells in which the probability of A equals x.  In the case of Heads and 0.8 that is just Cell C(q), the one marked by q.  And P conditionalized on that cell does assign 0.8 to Heads.  

Somewhat more ambitiously, what about 

                        P(Heads | µ(A) > 0.4)   >  0.4 ?

Well,   [µ(A) > 0.4] must be the union of the cells in which the probability of Heads is greater than 0.4.  Those cells are precisely Cells C(p) and C(q), corresponding to probability functions p and q.  And P conditionalized on their union assigns 0.65 to Heads, so yes, something greater than 0.4.

You, esteemed reader, will not have missed my ulterior motives for this discussion.

APPENDIX.  Superconditioning formally presented.

Our initial model:

= <S, F> is a (sample) space:  S is a non-empty set and F is a field of subsets of S, which includes S.  The members of F I will call elementary propositions.

TPR = {p1, …, pn}   is a finite set of probability functions each with domain F.  

Pin = ∑{bjpj: j = 1, …, n} is a convex combination of the members of TPROB.  

M = <S, TPR, Pin > is our initial model.

Our final model:

For each j, from 1 to n, Sj = <Sj, Fj> is an isomorphic copy of S.  These are disjoint.

There is a homomorphism : ∪{Fj: j = 1, …, n} –> F and the restriction of  to Fj is an isomorphism to F.  

 pj* is a probability function on Sj such that pj*(A) = pj(f(A)) for each proposition A in Sj.  

TPR* = { pj*| j = 1, …, n}

S* = <S*, F*> is the sample space with S* = ∪{Sj: j = 1, …, n}.

Field F*  on S* is the least field that has each set Sj as a member, as well as all the members of each field Fj , for j = 1, …, n.  The sets Sj are therefore the cells in a partition, and all other members of F* are unions of subsets of those cells.  

P is the probability function defined on S* as follows:

            For j = 1, …,n

  1. P(Sj) = bj
  2. for each A in F*, P(A|Sj) = pj*(A).  Equivalently, for each A in F*, P(A ∩ Sj) = P(Sj)pj*(A).  
  3. P is additive: if A and B are disjoint members of F* then P(A ∪ B) = P(A) + P(B)

It is clear that 3. does not conflict with 2. since pj* is additive.  Since the weights bj are positive and sum to 1, and each function pj* is a probability function which assigns 1 to Sj it follows that P is a probability function with domain F*.

M* = <S*, TPR*, P> is our final model.

Application

The proposition expressed by “the probability of A equals x” is now represented by the union of the cells Sj  such that  P(A | Sj) = x, that is, such that  P(A ∩ Sj) = P(Sj)x.  

Let those cells be  Sj  with j = k, …, m.   So the proposition [probability of A = r] is the proposition ∪{Sj: j = k, …, m}

Since these cells are disjoint, 

P(A ∩ [∪{Sj: j = k, …, m}])    =         ∑P(A ∩ Sj) : j = k, …, m}

                                                =          x∑ P(Sj) : j = k, …, m}

                                                =          x[probability of A = r]

or equivalently,

            P(A | [probability of A = x]) = x.

… an expression with a familiar look in many contexts … Specifically, this is the form of the Reflection Principle, and there are interesting connections between super conditioning and Reflection. The next post will be about that.

REFERENCES

Richard Jeffrey (1988) “Conditioning, Kinematics, and Exchangeability”.  Pages 221-256 in B. Skyrms and W. L. Harper (eds.) Causation, Chance, and Credence. Proceedings of the Irvine Conference on Probability and Causation, Volume 1.  Dordrecht: Kluwer.

A Rudimentary Algebraic Approach to the True, the False, and the Probable

A motivation for this, which will show up in Application 2, is to show that it is tenable to hold that in general, typically, conditionals are true only if they are certain. I do not propose this for conditionals in natural language. But I think it has merits in certain contexts in philosophy of physics, notably interpretation of the conditionals that appear in Einstein-Podolsky-Rosen and Bell Inequality arguments.

[1] The algebra page 1

[2] The language: first step in its interpretation page 1

[3] The algebra:  filters and ideals page 2

[4]  The language and algebra together: specifying a truth-filter page 2

[5] The language, admissible valuations, validity and the consequence relation page 3

APPLICATION 1:  probability space models page 4

APPLICATION 2: algebraic logic of conditionals with probability page 5

NOTE:  I will explain this approach informally, and just for the simple case in which we begin with a Boolean algebra.  

The languages constructed will in general not be classical, but in this case validity of the classical sentential logic theorems will be preserved, even if other classical features are absent.

But this approach can be applied starting with some other sort of algebra.

[1] The algebra

Let us begin with a Boolean algebra A, with the operations ∩,  ∪, -, relation ⊆, top K, and bottom Λ.  From my choice of symbols you can see that I find it useful to think of it as an algebra of sets.  That will be characteristic of some applications.  But this play no role for now, it just helps the imagination.

I will use little letters p, q, r, … to stand for elements of A.

I have left open here whether there are other operations on this algebra, such as modal operators.  Application 2 will be to a Boolean algebra with modal operator ==>.

[2] The language: first step in its interpretation

As far as the algebra is concerned, all elements have the same status.  But we can introduce distinctions from outside, by choosing a language that can be interpreted in that algebra.  When we do that each sentence E has a semantic value [[E]], which is an element of A, and we call it the proposition expressed by that sentence.

So let us introduce a language L.  It has atomic sentences, the classical (‘Boolean’) connectives &, v, ~.  It may have a lot more.  The interpretation is such that

[[~E]] = -[[E]] (the Boolean complement)

[[E & D]] = [[E]] ∩ [[D]]

[[E v D]] =  [[E]] ∪ [[D]]

and there will of course be more clauses if the language has more resources for generating complex sentences.

The atomic sentences, together with those three classical connectives, form a sub-language, which I will call Lat.  This this is a quantifier-free, modal operator free, fragment of L.  I tend to think of the members of Lat as the empirical sentences, the language of the data, but again, that is at this point only a mnemonic.

The set of propositions expressed by sentences in Lat I will call A0, that is {[[E]]: E is in Lat}, and it is clearly a Boolean algebra too, a sub-algebra of A.  In general A will be much larger than A0.

[3] The algebra:  filters and ideals

What about truth and falsity?  I will take it that the true sentences in the language together form a theory, that is, a set closed under the language’s consequence relation — which clearly includes the consequence relation of classical sentential logic.  I take it also that this theory is consistent, but do not assume that must be complete.

The algebraic counterpart of a theory is a filter: a set F of elements of A such that, if p ⊆ q and p is in F then so is q,  and if r, q are both in F then so is (r ∩ q).  A filter is proper  exactly if it does not have  Λ as a member.  That corresponds to consistency.

The filter that consists of the propositions expressed by the members of a consistent theory is a proper filter.  Obviously all filters contains K.

A set of elements of A is an ideal exactly if: if p ⊆ q and q is in G then so is p,  and if r, q are both in G then so is (r ∪q).  The ideal is proper if K is not in it.  Obviously any ideal contains  Λ.

Filter F has as counterpart an ideal G = {-p: p is in F}, where -p is the complement of p in A.  This corresponds to what the theory rules out as false.  

[4]  The language and algebra together: specifying a truth-filter

Now we are ready to talk about assigning truth-values.  Remember that the language L already has an interpretation [[.]] into the algebra of propositions A.  What we need to do next then is to select the propositions that are true, and then assign value T to the sentences that express those propositions.

Well, I will show a way how we can do that; but there are many ways.  I would like the ‘empirical sentences’ all to get a truth-value.  In addition there may be a class of sentences that also should get truth-values, for some reason.  They could be selected syntactically (in the way Lat is), or they could be selected as the ones that express a certain sort of proposition.  The latter would be a new way of doing the job, so that is what I will outline.

Step 1 is to specify a proper filter T on A, which will be the set of propositions that we will specially specify as true, regardless of whether they belong to A0.  Its corresponding ideal U is then the set of propositions that we will specially specify as false.

Step 2  is to specify a filter T0 on A0, as the set of true propositions which are values of ‘empirical sentences’, and indeed we want T0 to be a maximal proper filter on A0.  Then its corresponding ideal U0 on A0 is a maximal proper ideal, and A0 is the union of T0 and U0.  So every proposition in A0  is classified as true or false.

There is one important constraint on this step.  Clearly we do not want any proposition to be selected as true in one step and false in the other step.  So the constraint is:

                        Constraint on Truth Filtering.   T0  does not overlap U.  

It follows then also that U0 does not overlap T.

The final step is this: T* is the smallest filter that contains both and T0. We designate  T* as the set of true propositions in A.  This is the truth-filter.  Its corresponding ideal U* is the set of false propositions in A.

This is an unusual way of specifying truth conditions, not least because there will in general be propositions that belong neither to T* nor to U*: in general, bivalence fails.

We need to show that T* is a proper filter.  

Lemma. For every proposition p in T* there is a proposition q in and a proposition r in T0  such that q ∩ r ⊆ p.

It is easiest to prove this via the relation between filters and theories.  Let Z be the least theory that contains theories X and Y:  thus Z  is the set of sentences implied by X ∪ Y.  Implication, in our context, is finitary, so if A is in Z then there is a finite set of sentences belonging to X ∪ Y whose conjunction implies A.

Suppose now that  T* is not proper.  Then there is a proposition p such that both p and -p are in T*.  They cannot both be in nor both in T0.  The Constraint on Truth Filtering implies that if p is in T0 then -p is not in T, so -p must a proposition that is not in either T or T0.  Similarly if p is in then -p cannot be in T0  so it must be in neither  nor T0.  So we see that either p or -p belongs to neither T nor T0, but must be in the part of T* that is ‘implied’ by meets of elements taken from and from T0.

By the Lemma there must be propositions q and r in T  and T0 respectively such that (q ∩ r) ⊆ p, and also q’ and p’ in and T0 respectively such that (q’ ∩ r’) ⊆ -p .  But then there is a proposition s = (q ∩q’) in T and a proposition t = (r ∩ r’)  in T0 such that (s ∩ t) ⊆ (p ∩ – p) =  Λ. 

In that case t ⊆ -s, while t is in T0  and -s belongs to U.  And that is not possible, given the Constraint on Truth Filtering.

Therefore T* is a proper filter.

[5] The language, admissible valuations, validity and the consequence relation

Time to look into the logic in language L when the admissible assignments of truth-values are all of this sort!

What we have described informally now is the class of algebraic models of language L.  The sentences E in have as semantic values propositions [[E]] in A.  is a Boolean algebra with a designated filter T* and designated ideal U* = {-p: p is in  T*}.  An admissible valuation of L is a function v such that for all sentences E of L:

  • v(E) = T if and only if [[E]] is in T*
  • v(E) = F if and only if [[E]] is in U*

This function is not defined on other sentences: those other sentences, if any, do not have a truth-value.

So an admissible valuation is in general a partial function on the set of sentences of L.

Validity

Boolean identities correspond to the theorems of classical sentential logic.  If E is such a theorem then [[E]] = K, which belongs to every filter, and hence E is true.  

This holds for any model of the sort we have described, so all theorems of classical sentential logic are valid.

Deductive consequence

E1, …, En imply F exactly if, in each such model, if [[E1]], …, [[En]] are all in T* then [[F]] is in T*.

In classical sentential logic E1, …, En imply F exactly if (E1 & … & En)  (E1 & … & En & F) is a theorem.  So then ([[E1]] ∩ …∩ [[En]]) = ([[E1]] ∩ …∩ [[En]] ∩ [[F]]).  

It follows that if  [[E1]], …, [[En]] are all in a given filter then so is [[F]].

Therefore all such classically valid direct inferences (such as Modus Ponens) are valid in L.

Natural deduction rules

Those which involve sub-arguments can be expected to fail.  For example, (E v ~E) is valid, but it is possible that E lacks a truth-value, and so we would expect the Disjunctive Syllogism to fail.

We’ll see examples below.

 APPLICATION 1:  probability space models

The structure S = <K, F, P> is a probability space exactly if K is a non-empty set, F is a field of subsets of K (including K), and P is a probability function with domain F.  

A field of sets is a Boolean algebra of sets.  So we can proceed as above.

First there is a language LS, and if E is a sentence of LS then [[E]] is a measurable subset of K, that is to say, a set in F, a member of the domain of P.  And as before we have a fragment LSat which is the closure of the set of atomic sentences under the Boolean connectives.  The range of [[.]] restricted to LSat is a subfield — a Boolean subalgebra — F0 of  F.

The set TS = {p is in : P(p) = 1} is a proper filter.  That is so because P( Λ) = 0, P(p) is less than or equal to  P(q) if p ⊆ q, and P(p ∩ q) = 1 if and only if P(p) = P(q) = 1.

Similarly, there is a corresponding proper ideal US = {p is in : P(p) = 0}.

Just as above, TS is the beginning, so to speak, of the set of true propositions.  To determine an appropriate set of true propositions in F0 we begin with X = US  F0 That is a proper ideal as well, within that subalgebra.  Every such proper ideal can be extended (not uniquely) to a proper maximal ideal US0 on F0.  This we choose as the set of false propositions in that subalgebra, and the corresponding maximal filter TS0 on F0 is the set of true propositions there.

And now, to complete the series of steps we are following, we define TS* to be the least filter on F which contains both TS and TS0. The general argument above applies mutatis mutandis to show that TS* is a proper filter — our truth filter in this setting.

Unless LSat is the whole of LS we will now have truth-value gaps:  there will be non-empirical sentences that receive some probability intermediate between 0 and 1, and these are neither true nor false.

As before, there is no doubt that the axiomatic classical sentential logic is sound here.  However there are natural deduction rules which are not admissible.  For example, if something follows from each of P(p) = 1 and P(q) = 1 it may still not follow from P(p v q) = 1. For example, if we are going to toss a coin then Probability(Heads) = 1 entails that the coin is biased, and Probability(Tails)= 1 also entails that the coin is biased. But Probability( Heads or Tails) =1 is true also if the coin is fair.

APPLICATION 2: algebraic logic of conditionals with probability

This is an example of a probability space model, in which the algebra is a Boolean algebra with a binary modal operator ==>.  It begins with a ‘ready to wear’, off the shelf, construction, which I’ll describe.  And then I will apply the recipe developed above to give a picture of a language in which conditionals, typically, are true only if they have probability 1, and and false only if they have probability 0.

I am referring to the logic CE, which is like Stalnaker’s logic of conditionals, but weaker (van Fraassen 1975; see also my preceding blogs on probabilities of conditionals).  

The language has the Boolean connectives plus binary connective –>.  A structure M = <K,F, s> is a model of CE exactly if K is a non-empty set (the worlds), F is a field of subsets of K (the propositions), and s, the selection function, is a function which maps K x F into the subsets of K, with these properties:

  • s(x,A) ⊆ A
  • if x is in A then s(x,A) = {x}
  • s(x, A) has at most one member
  • s(x, A) =  Λ only if A =  Λ

The truth conditions for &, v, ~ are as usual, and for –> it is:

          A –> B is true in world x if and only if s(x,A) ⊆ B

          equally:  [[A –> B]] = {x is in K: s(x, [[A]]) ⊆ [[B]]}

and we can see that there is therefore an operator on , for which I’ll use the symbol ==>:

          [[A –>B]] =  [[A]] ==> [[B]].

This differs from Stalnaker’s semantics only in not imposing the further restriction on the selection function that it must derive from an ordering.  We may intuitively refer to s(x, A) as the world nearest to x that is in [[A]], but this “nearest” metaphor has no content here.

When this language is thus interpreted in model M, the propositions form a Boolean algebra with operator ==>, which has the properties:

(I)      [p ==> (q ∪ c)] = [(p ==> q) ∪ (p ==> c)]

(ii)     [p==> (q ∩ c)] = [(p ==> q) ∩ (p ==> c)]

(iii)    [p ∩ (p ==> q)] = (p ∩ q)

(iv)    (p ==> p) = K                                        ( “necessity” )

(v)     (p ==> -p) =  Λ unless p =  Λ                 (“impossibility”)

Let us call this a CE algebra.

probability model for CE is a structure <K, F, s, P> such that <K, F, s > is a model for CE and P is probability function with domain such that for all p, q in F

            P(p ==> q) = P(q | p) when defined

This condition is generally called Stalnaker’s Thesis (or more recently, just “the Thesis”).  Stalnaker’s logic of conditionals could not be nontrivially combined with this thesis but CE could.  As it happens, CE has a rich family of probability models.

Thus, if  <K, F, s, P> is a probability model for CE then S = < K, F, ==>, P> is a probability space model in the sense of the previous section, with some extra structure.

Now we can proceed precisely as in the preceding section to define a truth filter T* on the algebra of propositions.  As empirical statements we take the closure of the set of atomic sentences under just the Boolean connectives, that is the sentences in which there are no occurrences of –>.  The image of this language fragment by the map [[.]] is the relevant, privileged Boolean subalgebra F0 of in which every proposition is classified as true or false, as a first step.

In addition the propositions which have probability 1 are true.  And finally, anything implied by true propositions is true — all this understood as coming about as shown in the preceding section. Thus all theorems of CE are valid, and inference by modus ponens is valid.

As to sentences of form (A –> B), they are typically true only if P(A | B) = 1.  I say “typically” because we cannot rule out that the proposition [[A]] ==> [[B]] is a member of F0.  For the model of CE could be a model of a stronger theory, perhaps one that entails (implausibly!) that “if it is lit then it burns” is the meaning of “it is flammable”.  But typically that will not be the case, so typically (A –>B) will be classified as true only if P([[B]] | [[A]]) = 1.

REFERENCES

van Fraassen, B. C. (1976) “Probabilities of Conditionals”, in W. Harper and C.A. Hooker (eds.) Foundations of  Probability and Statistics, Volume l. Reidel: 261-308.

Conditionals and the Candy Bar inference

At a conference at Notre Dame in 1987 Paul Teller “made an issue” as he wrote later of “the fallacious form of argument I called the ‘Candy Bar Principle’:

from ‘If I were hungry I would eat some candy bar’ conclude ‘There is some candy bar which I would eat if I were hungry’.”  

And Henry Stapp, whom Teller had criticized mentioned this in his presentation: “Paul Teller has suggested that any proposed proof of the kind I am setting forth must contain such a logical error ….”

I do not want to enter into this controversy, if only because there were so many arguments swirling around Stapp’s proposed proofs.  Instead I want to examine the question:  

is the Candy Bar inference a fallacy?

Let’s formulate it for just a finite case:  there are three candy bars, A, B, and N.  The first two are in this room and the third is next door.  I shall refer to the following form of argument as a Candy Bar inference:

If I choose a candy bar it will be either A or B

therefore,

If I choose a candy bar it will be A, or, if I choose a candy bar it will be B

and I will symbolize this as follows:

C –> (A v B), therefore (C –> A) v (C –> B)

This has a bit of a history of course: it was submitted as valid in Robert Stalnaker’s original theory of conditionals and was rejected by David Lewis in his theory. Lewis showed that Stalnaker’s theory was inadequate, and blamed this principle. But we should quickly add that the problems Lewis raised also disappeared if this principle were kept while another one, shared by Stalnaker and Lewis, was rejected. This is just by the way, for now I will leave all of this aside.

How shall we go about testing the Candy Bar inference?

I imagine that the first intuitive reaction is something like this:

Imagine that I decide to choose a candy bar in this room.  Then it will definitely be either A or B that I choose. But there is nothing definite about which one it will be.  

I could close my eyes and choose at random.

Very fine!  But unfortunately that is not an argument against the Candy Bar inference, but rather against the following different inference:

It is certain that if I choose, then I will choose either A or B,

therefore

Either it is certain that if I choose I will choose A, or, it is certain that if I choose I will choose B

That is not at all the same, for we cannot equate ‘It is certain that if X then Y’ with ‘if X then Y’.  As an example, contrast the confident assertion “If the temperature drops it will rain tomorrow” with “It is certain that if the temperature drops it will rain tomorrow”.  The former will be borne out, the prediction will be verified, if in fact the temperature drops and it rains the next day — but this is not enough to show that the latter assertion was true.

So the intuitive reaction does not settle the matter.  How else can we test the Candy Bar inference?

Can we test it empirically?  Suppose two people, Bob and Alice of course, are asked to predict what I will do, and write on pieces of paper, respectively, “if Bas chooses a candy bar in this room, he will choose A” and  “if Bas chooses a candy bar in this room, he will choose B”.  Surely we will say:  

we know that if Bas chooses a candy bar in this room, he will choose A or B.  

So if he does, either Bob or Alice will turn out to have been right.

And then, if Bas chooses A, we will say “Bob was right”.

That is also an intuitive reaction, which appears to favor the Candy Bar inference.  But again, it does not really establish much.  For it says nothing at all about which of these conditionals, if any, would be true if Bas does not choose a candy bar.  That is the problem with any sort of empirical test: it deals only with facts and does not have access to what would have happened instead of what did happen.

Well there is another empirical approach, not directly to any facts about the choice and the candy bars, but to how reasonable, practical people would let this situation figure in their decision making.

So now we present Alice and Bob with this situation and we ask them to make bets.  These are conditional bets, they will be Gentlemen’s Wagers, which means that they get their money back if Bas does not choose.

Alice first asks herself:  how likely is he to choose a bar from this room, as opposed to from next door (where, you remember, there is bar N)  Suppose she takes that to have probability 3/4.  She accepts a bet that Bas will choose A or B, if he chooses at all, with payoff 1 and price 0.75.  Her expectation value is 0, it is just a fair bet.

Meanwhile Bob agrees with her probability judgment, but is placing two bets, one that if Bas chooses he will choose A, and one that if Bas chooses he will choose B.  These he thinks equally probable, so for a payoff of 1 he agrees to price 3/8 for each.  His expectation value is 1/4(0) + 3/8(1) + 3/8(1) minus what he paid, hence 0:  this too is just a fair bet.  

Thus Alice and Bob pay the same to be in a fair betting situation, where the payoff prices are the same, though one was, in effect, addressing the premise and the other the conclusion of the Candy Bar inference.  So, as far as rational betting behavior is concerned then, again, there is no difference between the two statements.

Betting, however, as we well now by now, is also only a crude measuring instrument for what matters.  The fact that these are Gentlemen’s Wagers, as they pretty well have to be, once again means that we are really only dealing with the scenario in which the antecedent is true.  The counterfactual aspect is beyond our ken.

To be clear:  counterfactual conditionals are metaphysical statements, if they are statements about what is the case, at all.  They are not empirical statements, and this makes the question about the validity of the Candy Bar inference a metaphysical question.

There is quite a lot of every-day metaphysics entrenched at the surface of our ordinary discourse. Think for instance of what Nancy Cartwright calls this-worldly causality, with examples like the rock breaking the window and the cat lapping up the milk. 

Traditional principles about conditionals, just as much as traditional principles about causality, may guide our model building.  And then nature may or may not fit our models …

So the question is not closed, the relation to what is empirically accessible may be more subtle than I managed to get to here.   To be continued …. 

REFERENCES

The Notre Dame Conference in question had its proceedings published as Philosophical Consequences of Quantum Theory:  Reflections on Bell’s Theorem (ed. J. T. Cushing and E. McMullin; University of Notre Dame Press 1989).   

My quotes from Teller and Stapp are from pages 210 and 166 respectively.

Deontic logic, time, and 1876

We use “ought” in two senses, evaluative (“things are not as they ought to be”) and hortatory (“you ought to do the best thing”). The latter is future-directed, and time entered deontic logic (with stit semantics) for that reason. But time brings much in train. What if you do what is the best thing in the short run, for tomorrow for example, and it precludes important options for the day after tomorrow? Or the day after that?

This is how infinity enters: a branching time, infinitely branching possible futures, with the outcomes of our choices. Our deliberation and decision making is inevitably short-sighted, considered sub specie aeternitatis. We can only see finitely many moves ahead. But that implies a danger: how do I form a policy that does not, with equal inevitability, lead me into an ultimately unsatisfying life?

Reflecting on this I remembered F. H. Bradley’s critique of Utilitarianism.

Frances Herbert Bradley, Ethical Studies (1876)

As a student I never liked ethics until I read Bradley. As a British Idealist he was intent on bringing to light all the contradictions in our experienced world; in ethics this led him to see an unresolvable conflict between the ideals of self-sacrifice and self-realization. Fascinating … but here I just want to focus on one little point, his criticism of Utilitarianism, which focused on moral deliberation over time.

The form of Utilitarianism Bradley confronts is, more or less, what is now typically described as rational preference based decision making, cost and benefit analysis, maximizing expected value. As he sees it, this form of reasoning leads inevitably to a sort of life to be regretted, such as the life of a miser who saves to become rich but can never stop saving, or of the businessman who works to make a million but can never stop pursuing the next million. In the exquisite prose of the British Idealists, which we cannot emulate today:

Happiness, in the meaning of always a little more and always a little less, is the stone of Sisyphus and the vessel of the Danaides — it is not heaven but hell.  (Essay III)

Falling into this trap seems inevitable if we pursue a quantifiable value, and if this pursuit is not subject to any constraint, whether external or internal, independent of that value. Let’s make this concrete.

The Contra Costa game

To the literature about problems with infinity in decision-making, such as the St. Petersburg Paradox and the Pasadena Game, I propose to add this one, to illustrate Bradley’s argument.

A coin will be tossed as often as we like. If you enter into the game here is what will happen. If the first toss comes up heads, you have two options. The first is to accept $10, and the game stops for you. The second option is to stay in the game. If the second toss then comes up heads you may choose between accepting $100 or staying in the game. And so on: if the n^th toss came up heads, you can choose between accepting $10^m, where m is the number of tosses that have come up heads so far; or you can stay in the game. If the toss comes up tails the game ends, and the player who stayed in ( unlike in the St Petersburg game), ends up with nothing at all.

Suppose toss N has just come up heads. If you stay in you have a 50% chance of getting the option to accept 10^(N+1), which is ten times more than what you could get now. There is also a 50% chance of getting 0. So the expectation value of staying in equals 0 plus 0.5(10)(10^N) = 5 times the value of opting out.

Thus what you ought to do (if your rule is to maximize expected value and you look only one day ahead), as long as you are in the game, at every stage, if the toss came up heads, is to stay in. And the result is that you will never get anything at all: you are living a life of anxious expectation, never able to let go of this devilish ladder of fortune, until either you are out with nothing to show for it, or you go on forever with no no payoff ever.

It may be objected here that no casino could be in a position to offer this game, it could not set a high enough price for entry. That is what we usually say about the St. Petersburg game as well. But think about the real-life analogue. The person who sets out to make a million, and remains at every stage intent on making the next million, has no reason to think that the world cannot afford to offer this possibility. The price paid is the work involved in making money, which is gladly paid.

There was a writer who traded on his readers giving some credence to there being a source with unlimited means: Blaise Pascal.

Similarity to Pascal’s Wager

Here is how Pascal might have posed the problem of the Contra Costa game for the rational unbeliever:

Everyday God says “Repent now, and you shall have eternal bliss!”

Everyday the unbeliever responds in her/his heart “I can take one more day and repent tomorrow, that has a higher value!”

(a) The game ends when s/he dies, and s/he loses.

(b) s/he lives forever. With this policy, she has the fate of Barend Fokke, the Flying Dutchman.

Either way, s/he misses out on eternal bliss.

Is there help from long-term thinking?

The unfortunate player we have just been discussing has this flaw: s/he looks only one day ahead. That it is a day, does not matter: there is a fixed unit of time such that s/he looks ahead only that unit of time, in making her rational decision on the basis of expected value.

I did not eat a chocolate bar just now because it would ruin my appetite, and make dinner much less enjoyable. So I am not that naive Utilitarian agent who just looks one minute, or one hour, ahead. I forego the immediate gain for the sake of gain over a longer period.

But what is that longer period? If it is, say 2 hours, or 2 days, then I am after all that naive agent, with a different time scale, focused on gain in a finite future period. In the Contra Costa game this will not keep me from continuing forever: I will not take today’s prize. Suppose I reflect on the possibilities for the next two tosses of the coin, when the first N tosses have come up heads. The probability is 0.25 that I can get two more heads, and can then accept 10^(N + 2). There is also a 0.75 chance that I will gain zero. So the expected value of that scenario is 0.25(100)(10^N), or 25 times what I can get now. Thinking farther ahead increases the temptation, the incentive, to stay in the game.

What if I am an ideal long term thinker, who does not set a finite limit on his long term evaluations? The probability is zero that the coin will always come up heads. This is relevant for those ideal long term thinkers who take themselves to be immortal: they will rationally refuse to play. But these are either deceived (if all men are mortal) or at least negligibly rare.

Escape from the game: not by cost-benefit analysis

There may be an easy advice to give to the player: Do look beyond the immediate future! Choose some N, and decide not to go farther, no matter what. That is, decide that you will take the money at some stage either before or when there are N heads in a row.

But how is this choice to be made? The expectation value when you make this choice to be N, is 0/2 + 10/4 + 100/8 + … +10^N/(N+1). That is less than for the choice of N+1. So if you choose N, you are not choosing to maximize expected value. It goes against the principle of rational cost-benefit analysis.

The tree of possible futures in this game is a finitely branching tree which has many finite branches and an infinite branch. The latter, the fate of Bradley’s naive but immortal agent, is clearly to be avoided (we ought not to do that!). But a choice among the others on the basis of their value is not possible: for every value seen there, there is one with greater value.

There is no question that we must applaud those who at some point take their winnings and rest content. (“Take the money and run!”, Woody Allen — and how did that work for you?) We must applaud them although that choice is not ratifiable by value-preference based decision making. So if there is to be an escape, we have to tell the agent to bring with her some constraint of her own, which overrides the maxim to maximize expected value. What could that be?

Escape from the game: projects and goals

What follows is a suggestion, that I think deserves to be explored when developing deontic logic. It is not my suggestion, but one I heard long ago. (Perhaps only in conversation, I am not sure.)

Glenn Shafer proposed, at one time, that practical reasoning should be conceived of as in the first instance goal-directed. Shafer was pointing to a fact about the phenomenology of practical deliberation: it is not realistic to depict us as cost-and-benefitters, we set our goals and deliberate only within the span of possibilities left open by our goals.

(What about the goal-setting activity, we may ask? A specific goal may be set as the outcome of a deliberation, which took place within the limits set by previous goals. There is no beginning, we are thrown into a life in which we find basic goals already set. Il n’y a pas dehors du … )

I am saving for a holiday in Hawaii next winter. To have that holiday is my goal. The actual value to me of this holiday, if it happens, will depend on many factors (weather, companions, …) but these factors do not figure in the identity of the goal. (They might have figured, in some way, in my goal-setting).

This goal then constrains what I ought to do meanwhile. Among choices I face along the way, it will delete options that conflict with my going to Hawaii next winter. And if my choices get to the point where only one more move is necessary (getting on the plane …) it will prevent me from self-sabotage. For it would be self-sabotage (given the goal I have) to be looking around at that point and considering alternatives, however amazingly attractive they may be just then.

And it is part of the concept of having a goal that there is a stopping point: when the goal is reached, it is not foregone in favor of a suddenly glimpsed pretty face or chance at a bit of easy money.

So in the Contra Costa game this could be the goal of gaining $1000 (perhaps exactly what I need to pay off my loans, or to buy my wife a necklace for her birthday). That implies that I will accept $1000 and end the game if the coin comes up heads three times in a row. I may lose of course, but what I will not do after three heads in a row is to take a chance on getting $10,000. True, I would have a 50% chance of getting 10 times more, but I have reached my goal, rest content, and do not start on a life of Sisyphus.

There are of course less strictly fashioned constraints: we could call this one a Shafer Goal, and agree that there are also defeasible goals, that would need more sophisticated modeling.

What is crucially important is to recognize the necessary independence, autonomy, externality of the constraint (even though set by the actor herself). For if the choice of constraint itself has to be based on value-preference expectation reasoning, we have not escaped the game at all, we have just found ourselves in the same game on another level.

If the constraint takes the form of a goal I set myself, this must be modeled as an independent imperative or default rule, inserted at a different place. It must be, that is to say, another heart’s command not assimilable in value-preference based reasoning.

Coinductive definition, an example

A note about coinductively defined predicates

This is a reflection on some lectures by Andrei Popescu (2021).  I am intrigued by the idea of coinduction (which was new to me): a form of definition in terms of rules, which goes beyond inductive definition.  Popescu gives an example of a set which is defined by rules, does not admit of an inductive definition, but admits of a coinductive definition.  This is presented in the context of Isabelle, an interactive theorem prover.  But the idea of coinductive definitions is very intriguing in itself.

So I wanted to make up a simple example in normative reasoning or rule-governed behavior.  I will use the game of chess and the property of being a proper game, that is, a sequence of moves by two players which are entirely in accord with the rules of chess.

The universe U will be a set of countable sequences (finite or countably infinite) of elements of the set V of situations or stages, which are pairs <position, player>.  By a position I mean a complete specification of the pieces on the board and where they are.  A player is an item, White or Black, and for convenience I will use “*” this way:  White* = Black and Black* = White. In stage <p, X>, X is the player who has to make a move, that is, do something that transforms p into another position.

So how can we go about defining the set of proper games?  Let’s begin by aiming for a predicate, “legit” which will apply to such sequences, that is, elements of U.

Inductive definition of the predicate legit

There is one position that is special, the base position, which is the layout of the pieces at the start of the game.  A standard game begins with this.  But chess aficionados like to work on chess problems of many sorts, which begin with a specified initial position (and are meant to arrive at another position). So we will include among proper games any that are played with any chosen initial position.  The definition I propose as a first attempt has a basis clause and a generative clause.

[i] A one-member sequence b = <p, X> is legit 

[ii] if s is a legit sequence, and the last member of s is <p, X> then the result of adding <q, Y> to the end of s is also a legit sequence, provided Y = X* and q results from p by an admissible chess move by X.

[iii] that is all

This is an inductive definition of the set LI of legit sequences.  

But it isn’t good enough: all members of LI are finite.  For the basis clause introduces only finite  sequences, and all the modes of generation preserve finitude.  So by induction, all members of LI are finite.  And the defining rules of the game of chess allow in principle for never-ending games.

Note about checkmate and blocking rules:  There is no position which can result by admissible chess rules from a position in which a king is missing.  Specifically, if s is a legit sequence and <p, X> is a member of s, and a king is missing in position p, then <p, X> is the last member of s. In tournaments there are also artificial rules, like the 50 move rule or 3 repetition rule, used to curtail the length of play, but they are not part of what the game of chess is.

Never ending games of chess

There are certainly infinite sequences that should count as proper games of chess, although they will never be played in reality (cf. Stewart 1995).

So what sort of definition would work to define the set — call it LG — of proper games, which includes all the right members of U, infinite as well as finite?

First of all, what [i] and [ii] say should remain true for LG, though not [iii], and the rule of induction should not be applicable to LG in the above fashion. 

Following Popescu’s lead for this simpler example, we can put it this way.  The set LI is the smallest set of sequences such that [i] and [ii] are true of it.  More precisely:

Definition. LI is the smallest subset W of U such that:

(a) all one-member sequences b = <p, X> are in W

(b) if s is in W, and the last member of s is <p, X> then the result of adding <q, Y> to the end of s is also in W, provided Y = X* and q results from p by an admissible chess move by X.

As Popescu says, it would not do to just change “smallest” to “largest” in this formulation, for it would let in every sequence which has no last member.

What is needed instead is to select the largest class of sequences that represent games played by the rules of chess — the ones in which every stage follows legitimately upon the one before it.  (Also of course that if member e has a successor then that follows upon e by an admissible chess move.   But that is redundant, for the above would apply to e’s successor.)

So we should propose:

Definition. LG is the largest subset T of U such that: 

(a’)   all one-member sequences b = <p, X> are in T

(b’) if s is in T then for any n, if the nth member of s is s(n) = <q, Y>, and s(n) is not the first member of s, then the (n-1)th member of s is a couple <p, X> such that Y = X* and q results from p by an admissible chess move by X.

Note that (b’) is, as it were, backward-looking:  unlike (b) it specifies where a position comes from rather than what it leads to. 

Does LG exist?  And if it does, does it include all and only proper chess games?

The set LG, thus defined

Here are the questions that need to be answered

  1. Does LG, thus defined, exist? To be precise: is there a largest set T such that (a’) and (b’) hold?
  2. Is LI part of  LG? 
  3. Does LG have any members other than the members of LI?

The answer to all three is positive.

For question 1, note that the set of all one-member sequences is a set T such that (a’) and (b’) hold, trivially.  Secondly, if S = {T(i): i in J} is a family  of sets such that (a’) and (b’) hold, then ∪S is also such a set, for those conditions apply to the members individually and not to any relations among members of U.   The answer to question 2 is somewhat less obvious.

Theorem 1. If s is a finite sequence in U then s is in LI if and only if it is in LG.

Argument:  

Suppose s in U has n members.  

If n = 1 then s belongs to both LI and LG by clauses (a) and (a’).

Suppose n > 1 and that T1 holds for all sequences in U of length less than n.  Then s belongs to LG if and only if s(n) = <q, Y> results from s(n-1) by an admissible chess rule, while replacing Y*by Y.  But it is also the case that s belongs to LI, if s’ = <s(1), …, s(n-1)> belongs to LI, if and only if  s(n) is related to s(n-1) in precisely that way.

Since there are examples of never-ending chess games it follows also that LG has members not in LI.

To prove a property in LG

To see how it is possible to prove things about the games in LG, going beyond what can be done for LI by mathematical induction, but in roughly the same way, I will use a specific example of a theorem about the game of chess

At any given stage in a game, a player may or may not have a winning strategy.  X wins the game exactly if a stage <p, X> is followed by stage <q, X*> and there is a king missing in position q.  Since we ignore the practical, arbitrary stopping rules applied in tournaments, there are three possibilities:  X wins, X* wins, or neither wins which is the same as that the game does not end, the representing member of U is an infinite sequence.  This is called a draw.

A player may have a winning strategy, or a strategy that forces a draw. It was Zermelo who began the mathematical study of chess in the early time of game theory.  A very basic result, connected with Zermelo’s work, is the theorem  that if a player has no winning strategy then his opponent has at least a draw-forcing strategy.  

This is not a result that we can prove by mathematical induction for the set LI.  There all games end either in a check-mate or end arbitrarily, that is, end although they could be extended.  There is no way for either player to force such an abrupt ending (assuming they are not subject to the practical limitations such as are set for a tournament).  For there is no position, except at check-mate, when the player whose turn it is, has no possible moves.

The theorem pertains therefore not to LI but to LG.  If we look at the proof, we see that it nevertheless looks very much like a proof by induction.  The theorem was proved by Kalmar in 1928:

Theorem 2.  At stage s(n) = <p, X> in member s of LG, if player X  does not have a winning strategy then X* can at least enforce a draw.

Proof.  Suppose that at stage s(n) X does not have a winning strategy.  Then whatever move X makes to yield s(n+1) = <q, X*>, X still does not have a winning strategy.  

For suppose he does; call it strategy S.  Then whatever move X* makes to produce s(n+2) = <r, X>, there is a way for X to execute that strategy so as to force a win.  But in that case, X had a winning strategy at stage s(n), namely to pursue S after transforming position p into position q.

So, generalizing, if X does not have a winning strategy at stage s(n) then X does not have a winning strategy at s(n+1).  So at s(n+1), when it is X*’s turn to move, X* can choose a position where X still does not have a winning strategy.

This implies that at this stage X* has a strategy that will prevent X from winning, namely to make such a choice at each successive stage.  This will either lead to a win for X* or will prevent the game from ever ending.

Fine.  But now we have a puzzle:  this looks like a proof by mathematical induction, but LG is not an inductively defined set.  So what is the actual form of this proof?

Koenig: “About a method of proof from the finite to the infinite”

What we have to notice about LG is that the infinite is involved in one way, but not in another: 

This should remind us of Koenig’s lemma:

the sequence representing a game may be infinite, but at each stage, there is only a finite number of possible next stages

Koenig’s Lemma. If a tree with the finite branching property has infinitely many nodes then it has an infinite branch

The proof of Koenig’s lemma requires the Axiom of Choice.  And it is Koenig’s lemma that provides the implicit assumption about an infinite sequence of rule-and-aim governed choices in the above argument.

Looking back now to the argument in the previous section, suppose that in sequence s, at stage s(n), X does not have a winning strategy.  Now consider the tree generated at s(n) by generating as stage n+1 all the different possible results of legitimate moves by X, then at stage n+2 all the possible moves by X* and so forth.  The stages in the branches are called the nodes in the tree, and each node has a rank, the number of steps it is beyond s(n).

X* will pursue the above indicated strategy, while X may pursue any strategy he likes.  

This is a finitely branching tree, because there are at any point, in any game of chess, only finitely many possible moves for the player whose turn it is.  And some or all of the branches may have a node N which is a checkmated position (a king is missing).  The property of X not having a winning position is preserved in every move in every branch, so it is a property of X at every node.  (This follows by mathematical induction: each node has a finite rank.)  Therefore the immediate predecessor of N was a position in which X did not have a winning position.  Hence the checkmate is by X*.  

If it is not the case that checkmate is reached in all branches then at every rank there is in some branch a move to a node with the next rank.  Hence there are nodes with every rank 1, 2, 3, …, so  number of nodes is not finite.  Now Koenig’s lemma rules out the idea that this could be because the tree has infinitely many finite branches of increasing length.  The  lemma establishes that the tree has an infinite branch, which counts as draw in this (infinite) game of chess.

So here is the diagnosis of the sort of reasoning about LG in our example, which looks like mathematical induction but isn’t: 

it is a combination of arguments that include a mathematical induction and also an application of Zorn’s lemma 

(which is to say, it involves an appeal to the Axiom of Choice, guaranteeing the existence of a specific, required, series of elements selected by an infinite number of choices).

NOTES

I want to thank Ali Farjami for directing me to this fascinating subject.

1) Coinduction is informally described as the dual of induction.  An inductively defined set is the smallest set closed under its defining relation, or equivalently, under its defining rules. We can think of such a definition as a procedure starting with certain members of the set and then generating all the others by applying the rules.  A coinductively defined set is the largest set of items that are produced by the defining rules from others of its members. Equivalently, a coinductive definition specifies the greatest set R consistent with given rules: every element of R can be seen as arising by applying a rule to some element of R.

2) The general form in which Koenig stated his 1927 lemma was this:

Let E1, E2, E3,… be a countably infinite sequence of finite nonempty sets and let R be a binary relation with the property that for each element xn+1 of En+1 there exists at least one element xn in En, which stands to xn+1 in relation R which is expressed by xnRxn+1. Then we can determine in each set En an element an such that anRan+1 for the infinite sequence a1, a2, a3, . . . always holds (n = 1, 2, 3, . . .). 

For the exact form in which I show its application, with its proof, see Formal Semantics and Logic pp. 16-18.

3) Having come this far I can now appreciate that the subject can be presented much more elegantly by recourse to the Knaster-Tarski fixed point theorem (as Popescu remarks). 

Suppose F is the function defined by the rules in question, so that F(X) is the set of results of applying the rules to elements of X.  Then the least fixed point of F is the set  ∩{X: F(X) ⊆ X}, that is, the smallest set containing X that is closed under application of those rules — which corresponds to an inductive definition of that set.  And the greatest fixed point of F is the set ∪{X: X ⊆ F(X)}, that is, the largest set of which we can say that all its members are obtained by the rules from its members — which corresponds to a coinductive definition of that set.

But the process was not transparent to me till I had traced it out in terms more familiar to myself, at a more elementary level.

BIBLIOGRAPHY 

Christian Ewerhart  “Backward induction and the game-theoretic analysis of chess”.  Games and Economic Behavior 39 (2002): 206-214.

König, Dénes (1927), “Über eine Schlussweise aus dem Endlichen ins Unendliche”.  Acta Sci. Math. Szeged 3, 121-130 

Paul B. Larson  “Zermelo 1913”. In: Ebbinghaus HD., Fraser C., Kanamori A. (eds) Ernst Zermelo – Collected Works/Gesammelte Werke. Schriften der Mathematisch-naturwissenschaftlichen Klasse der Heidelberger Akademie der Wissenschaften, vol 21. Springer, Berlin, Heidelberg: 2010. https://doi.org/10.1007/978-3-540-79384-7_9

Lawrence C. Paulson “A fixedpoint approach to (co)inductive and (co)datatype definitions”. Cambridge computer laboratory technical report 1997.  https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-320.pdf

Andrei Popescu “Inductive and Coinductive Reasoning with Isabelle/HOL”.  Midlands Graduate School in the Foundations of Computing Science 2021.  12-16 April 2021.  https://staffwww.dcs.shef.ac.uk/people/G.Struth/mgs21.html

Ulrich Schwalbe and Paul Walker “Zermelo and the early history of game theory”.  Games and Economic Behavior 34 (2001): 123-137.

Ian Stewart “The Never Ending Chess Game”  Scientific American 273 (4): 182-183. Downloaded from ResearchGate.

Probabilities of Conditionals: (2) examples in finite set-ups

I will address examples of the sort that come up in philosophical discussion, including the recent example due to Paolo Santorio. 

This post involves the notions introduced in ‘Probabilities of Conditionals: (1)’, and continues the discussion of Set-Up 1 and Set-Up 2.

PROPOSITIONS TRUE IN 0, 1, OR 2 WORLDS

In the previous post I showed that in Set-Up 1 we can construct s(p,.) so that P(p →r) = P(r | p) for any proposition p that has three members. As I also pointed out this cannot be done for propositions with four or five members, since their probability is not a multiple of 1/6. Let us look at what remains.

The empty set  Λ is a proposition.  We do not really need to define a selection function for it, but if we do, we can set s(Λ, x) = Λ for all worlds x.  The result is the same, (Λ → u) = {x: s(Λ, x) is part of Λ} = S, the tautology.

Some propositions are true in one and only one world.  Consider now u = {(1)}.  It is clear that P(r|u) equals 1 iff u is part of r, and equals 0 otherwise.  So we set s(u, x) = {(1)} for all x.  Thus (u →{(1)}) = S and (u → r) = Λ if (1) is not in r.

Finally, we still need to consider propositions true in precisely two worlds.  Let  t = {(1), (2)}.  If x is in t we want P({(x)} | t) to equal 1/2.  So we need there to be two worlds in ~t whose nearest world in t is (1), and two worlds in ~t whose nearest world in t is (2).  Since there are precisely four worlds in ~t, that is easily done.  

But note well that here, in Set-Up 1, we cannot have conditionals with either ~u or ~t as antecedents, for those have respectively five and four members.  

Thus we have here canvassed all the conditionals that can appear as antecedents in examples of conditionals represented in Set-Up 1.  These are the propositions true in just no, one, two, or three worlds.  But even here we can find some interesting, even somewhat surprising, consequences.  

SOME INTERESTING CONSEQUENCES

As in the preceding post, 

p = {(1), (3), (5)},  “the outcome is odd”

q = {(1), (2), (3)}, “the outcome is low”

Previous results: (p→q) = {(1), (3), (2), (4)} and (~p → q) = {(2), (1)

[I]  Some nesting of arrows already appear in Set-Up 1  

Since (~p → q) = {(2), (1)}, the second-degree proposition (p → (~p → q)) is the proposition (p → {(2), (1)}). This has (1) in it since that is in the intersection of antecedent and consequent.  And it has (2) in it, since the nearest p-world of (2) is (1), which also belongs to {(2), (1)}.  And that is all, so [p → (~p → q)] is just (~p → q) itself.  

This already shows that Import-Export fails, because (p & ~p) → q is the whole of S, since s(p & ~p, x) is necessarily empty for any world x.

[II] An assumption in some ‘triviality’ discussions

Another example that appears in ‘triviality’ discussions is a proposition of the form of [~q → (p → q)], and the assertion that P(p →q| ~q) = 0.

(See Khoo and Mandelkern, page 506, for its role in connection with Lewis’ triviality proof).  

In the previous post we saw that (p → q) = {(1), (3), (2), (4)}.  We can at once see the conditional probability P(p → q | ~q):  it is the proportion of {(1), (3), (2), (4)} in {(4), (5), (6)}, and that equals 1/3, not zero!  

Intuitively: since p →q and ~q have world (4) in common, that world (4) must be in [~q →(p → q)].  Therefore the probability of [~q →(p → q)] must be at least as high as the probability of {(4)}, so it cannot be zero.

So it is not the case that in general P(p →q| ~q) = 0.

 [III] Deceptive intuitions about conjunction    

Another observation concerns conjunctions.  Intuitively we would read “The match will light if it is struck, and the match will light if it is not struck”  as meaning simply that the match will light. (We might add, “regardless of whether it is struck”).  

That intuition may have some value, or it may not, but it does not generalize.  In our example here we see that 

[(p –> q) ∩  (~p –> q)] = {(1), (2)}, which has probability 1/3

That is not the probability of q, which is 1/2.  

Nor is q the same proposition as [(p → q) & (~p → q)].

[IV] The example due to Paolo Santorio

Finally, we can actually treat the probabilities in the example due to Paolo Santorio, which I heard from Branden Fitelson.

A die is tossed and to check Import-Export we investigate whether the following two propositions have the same probability:

If the outcome is either odd or six then (if the outcome is even then it is six)

If the outcome is odd or six, and it is even, then it is six

The second proposition is necessarily true (given our assumptions about the situation) so it has probability 1. But the first proposition has a nested “if .. then” and so is, philosophically, up for grabs. Must it have probability 1? As we will see the answer in CE is NO.

We can handle this example to some extent in Set-Up 1, although it involves a proposition true in four worlds.  

a = {(1), (3), (5), (6)} “the outcome is either odd or six”

b = {(2), (4), (6)}      “the outcome is even”

c = {(6)}                     “the outcome is six”

Since the meet of a and b is c, it is clear at once that [(a ∩ b) → c] = S

Here in Set-Up 1 we can already deal with (b → c).  In fact b = ~p in our first example, so we can verify, with a quick look at the diagram, that (b → c) = {(5), (6)}.  

Thus P(b →c | a) = 1/2.

This will still be correct when we look at it in Set-Up 2: there (b → c) has twenty members, all of which belong to a, which has forty members.

So if [a → (b → c)] has a probability at all it must be, according to the Equation, 1/2 and not 1.

[V]  What about Santorio’s conditional [a → (b → c)]?

In Set-Up 2, (b → c) is the set of worlds which is the union of [(5)] and [(6)], and similarly, proposition a is the union of [(1)], [(3)], [(5)], and [(6)].

We need to construct s(a, .) so that the conditional probabilities come out right, as before.  So P(a → [(1)] = 1/4, etcetera.  This we do by dividing the union of [(2)] and [(4)] evenly:

s(a, <2,x>) = <1,x> for x = 1, …, 5

s(a, <2,x>) = <3,x> for x = 6, …, 10

s(a, <4,x>) = <5,x> for x = 1, …, 5

s(a, <4,x>) = <6,x> for x = 6, …, 10

So [a → (b → c)] is true first of all in all of [(5)] and [(6)], and secondly in all of { <4, x>: x = 1, …, 10}, a total of 30 worlds.  The probability of [a →(b →c)] is accordingly 30/60, that is, 1/2 as it should be.

This also shows the failure of Import-Export, now we have represented the example completely in Set-Up 2.  For the ‘other’ conditional, [(a  ∩ b) → c] = (c → c) is not the same proposition at all, it is a tautology and has probability 1.

REFERENCES

Justin Khoo and Matthew Mandelkern, “Triviality Results and the Relationship between Logical and Natural Languages”. Mind 128 (2019): 485-526.