A Moore Statement (one that instantiates Moore’s Paradox) is a statement that could be true, but could not be believed. For example, “It is raining but I don’t believe that it is raining”.
We find interesting new varieties of such statements when we replace the intuitive notion of belief with subjective probability. Then there are two kinds of Moore Statements to be distinguished:
An Ordinary Moore Statement is one that could be true, but cannot have probability one. A Strong Moore Statement is one that could have positive probability, but could not have probability one.
When we introduce statements about objective chance there are Moore Statements in our language. Consider first the following (not a Moore statement) said when about to toss a die:
[1] The number six won’t come up, but the chance that six will come up is 1/6.
On this occasion both conjuncts can be true. The die is fair, so the second conjunct is true, and when we have tossed the die we may verify that our prediction (the first conjunct) was true as well.
Moreover, [1] can be believed, perhaps by a gambler who bet that the outcome will be odd and is feeling lucky. Or at least he could say, even with some warrant, that it seems likely (or at least a little likely) that [1] is the case. The gambler could even say (and who could disagree, if the die is known to be fair?) that the probability that [1] is true is 5/6!
The way I will symbolize that is: P(~Six & [ch(Six) = 1/6]) = 5/6.
In this sort of example we express two sorts of probability, one subjective and one objective. Are there some criteria to be met? Is there to be some harmony between the two?
Like so much else, there are some controversies about this. I propose what I take to be an absolutely minimal constraint:
Minimal Harmony. P(ch(A) > 0) = 1 implies P(A) > 0 If I am sure that there is some positive chance that A then it seems to me at least a little likely that A.
I really cannot imagine someone seriously, and rationally, saying anything like
“I am certain that there is some chance that the six will come up, but I am also absolutely certain that it will not happen”.
Except a truly deluded gambler, with a gambling strategy sure to lead to eventual ruin?
To construct a Moore Statement we only need to modify [1] a little:
[2] The number six won’t come up, but the chance that six will come up is not zero
~Six & ~[ch(Six) = 0]
That [2] could be true we can argue just like we did for [1]. But [2] is a Moore Statement for it could not have subjective probability 1, by the following argument.
Assume that P([2]) = 1. Then:
P(~Six) = 1
P(Six) = 0
P(~[ch(Six) = 0]) = 1
~[ch(Six) = 0] is equivalent to [ch(Six) > 0]
P(ch(Six) > 0) = 1
contradiction between 2. and 5, violation of principle Minimal Harmony.
Here 1. and 3. follow from the assumption directly. For 4. note that the situation being modeled here is the tossing of a die with chance defined for the six possible outcomes of that toss.
Not closed under conditionalization
This means also that [2] is a statement on which you cannot conditionalize your subjective probability, in the sense that if you do, your posterior opinion will violate Minimal Harmony.
So we have here another case where the space of admissible probability functions is not closed under conditionalization.
I will make all this precise in the Appendix.
REFERENCE
My previous post called ‘Stalnaker’s Thesis → Moore’s Paradox’
APPENDIX. Semantic analysis: language of subjective probability and assessment of chance
As an intuitive guiding example we can think of a model of a tossed die. There is a set of possible worlds, and in each there is a die (fair or loaded in some fashion) that is tossed and a number that is the outcome of the toss. To represent the die we need only the corresponding chance function, e.g. the function that assigns 1/6 to the set of worlds in which the outcome is x (for x = 1, 2, 3, 4, 5, 6). Then, a special feature of this sort of model, there is the set of probability functions on these worlds, representing the different subjective probabilities one might have for (a) what the outcome is, and (b) in what fashion the die is loaded.
Definition. A probability space M is a triple <K, F, PP> where K is a non-empty set, F is a Borel field of subsets of K, and PP is a family of probability measures with domain F.
The members of K we call “worlds” and the members of F, the ‘measurable sets’, we call propositions.
Definition. A subset PP* of PP in probability space M = <K, F, PP> is closed under conditionalization iff for all P in PP* and all elements A of F, P( -|A) is in PP* if p(A) > 0
Definition. A probability spacewith chance M is a quadruple <K, ch, F, PP> where <K, F, PP> is a probability space and ch is a function that assigns to each world w in K a probability function ch(w) defined on F.
Definition. For world w in K, GoodProb( w) = {P in PP: for all A in F, if P(ch(w)(A) > 0) = 1 then P(A) > 0}.
Theorem. GoodProb( w) is not closed under conditionalization.
Proved informally the Moore Paradox way, in the body of this post.
The relevant language has as vocabulary a set of atomic sentences, connectives & and ~, propositional operators (subnectors, in Curry’s terminology) P and ch, relational symbols = and >, and a set of numerals including 0.
There is no iteration or nesting of P or ch, which form terms from sentences.
Simultaneous inductive definition of the set of terms and sentences:
An atomic sentence is a sentence
If A is a sentence then ~A is a sentence
If A, B are sentences then (A & B) is a sentence
If A is a sentence and no terms occur in A then ch(A) is a term
If A is a sentence and P does not occur in A then P(A) is a term
If t is a term and n is a numeral then (t = n) and (t > n) are sentences.
Truth conditions for sentences:
For M = <K, ch, F, PP> a probability space with chance, and P a member of PP, a P-admissible interpretation ||…. || of the language in M is a function that maps the sentences to propositions, and numerals to numbers (with 0 mapped to 0), subject to the conditions:
||A & B|| = ||A|| ∩ ||B||
||~A|| = K – ||A||
||ch(A) = n|| = {w in K: ch(w)(||A||) = ||n||}
||ch(A) > n|| = {w in K: ch(w)(||A||) > ||n||}
||P(A) = n|| = {w in K: P (||A||) = ||n||}
||P(A) > n|| = {w in K: P (||A||) > ||n||}
Note that ||P(A) = n|| is in each case either K or empty, and similarly for ||P(A) > n||.
We call a sentence A true in world w exactly if w is a member of ||A||.
For example, if A is an atomic sentence then there is no constraint on ||A|| except that it is a proposition. And then sentence P(A & ch(A) > 0) = n is true under this interpretation (in all worlds) exactly if P assigns probability ||n|| to the intersection of set ||A|| and the set of worlds w such that ch(w)(||A||) is greater than zero. And otherwise that sentence is not true in any world.
Thinking about odds brings new insights into how we deal with probabilities. It illuminates puzzles about confirmation, conditionalization, and Bayes’ Theorem, as I illustrated informally in the earlier posts tends to take a helpfully simpler and more intuitive form when it is in terms of odds. Now I’ll explore the ins and outs of odds in a natural mathematical setting. (With examples and exercises.)
Odds more general than probabilities
Odds are ratios of probabilities. For example if the odds of snow against no-snow are 3 to 5 then the probabilities of snow and no-snow are 3/8 and 5/8 respectively. And vice versa.
But that example is special: it allows the deduction of the probabilities from the odds. Sometimes we know the odds but not the probabilities. Suppose four horses are running: I might say that I don’t know how likely it is that Table Hands will win, but he is twice as likely to win as True Marvel. The odds for Table Hands against True Marvel are two to one.
So odds provide a more general framework for reasoning and deliberation.
To move smoothly from our intuitions to more precise notions, let’s begin with a finite probability space, and redescribe it in terms of odds.
M= <K, F, PP> is a probability space iff K is a non-empty set, F is a field of subsets of K, and PP is a non-empty set of probability functions with a domain that includes F. M is a simple probability space if P has only one member.
The elements of K are variously called points, outcomes, events, possibilities, or – as I will do here – worlds. For example these worlds could be the outcomes of tossing a die, or the points at which a team can wipe out in the World Cup. The elements of F are sometimes called events too, or – as I will do here – propositions.
A field of sets is a Boolean algebra of sets, with ⊆, ∩, ∪, and ~.
In this post, to keep things simple, I will take K to be the finite set {x1, …, xn}, F the family of all subsets of K, and PP to be the set of all probability functions defined on F.
A probability vector p is a function which assigns a number to each world, these numbers being non-negative and summing to 1. We write it in vector notation: p = < p1, p2, …, pn>. A probability function P defined on the propositions is determined entirely by a certain probability vector p: P(A) = Σ{p(x): x is in A}, or equivalently, p(x) = P({x}), for each world x. So we can stick to just the probability vectors in examples.
Let’s spell out odds in the same way. An odds vector is like a probability vector: the values it assigns to worlds are non-negative real numbers. But the sum of the assigned numbers need not be 1. For example if x1, …, xn are the outcomes of the toss of a fair die, that is represented by odds vector <1,1,1,1,1,1>.
Odds vector v satisfies the statement that the odds of A to B are a : b exactly if a : b = Σ{v(x): x is in A} : Σ{v(x): x is in B}. (I’ll make this precise below.)
Note 1, the null vector. There is no practical use for a locution like “odds of 0 to 0”, and so it would be reasonable to exclude the the null vector, which assigns 0 to each world. It certainly does not correspond to any probability vector. But sometimes it simplifies equations or calculations to let it in, so I will call it an odds vector too, for convenience, and by courtesy.
Note 2, certainty. That proposition A is certain means that the odds of A to ~A are 1:0, that is, infinite. This makes sense in the extended real number system, with ∞ the symbol for (positive) infinity. When A is certain its odds against anything incompatible with A are infinite.
It may be convenient to put it in a negative way. The odds of (It both won’t and will rain tomorrow) to (It either will or will not rain tomorrow) are 0 : 1. That is a well-defined ratio, and means that the first proposition is certainly not true (ranked as such by the odds vector in question). Equivalently, of course, its negation (the second proposition) is certain.
Probability vectors are odds vectors. But now we have a redundancy. Two odds vectors are equivalent if one is a positive multiple of the other. For example, the odds of 4:2 are the same as the odds of 2:1.
If P and P’ are probability functions defined on F then so is P* = xP + (1-x)P’, provided x is in [0,1]. P* is a mixture (convex combination) of P and P’. [1] It is an important feature of the model that the set PP of probability functions defined on F is closed under the formation of mixtures.
Mutatis mutandis for odds: here the method of combination is not convex but linear. The restriction on the coefficients in the mixing equation for probability is not needed.
Definition. A mixture of odds vectors is a linear combination: v* = av + bv’, provided a, b are non-negative real numbers.
Note well that you cannot just replace an odds vector by an equivalent vector anywhere. Addition is linear, that means that v + v’ is equivalent to k(v + v’) = kv + kv’. Even though v’ is equivalent to kv’, v + v’ is not in general equivalent to v + kv’.
EXAMPLE 1. K is the set {1, 2, 3, 4, 5, 6} of outcomes of the toss of a die. This die is one of two dice. One of them is fair, the other is loaded in such a way that the higher numbers 4, 5, 6 come up twice as often as the lower ones. And the die that is tossed is equally likely to be the one or the other.
We have vector v = <1,1,1,1,1,1> to give the odds on the assumption that the die is fair. Similarly, vector v’= <1,1,1, 2,2,2> represents the case of a loaded die with the higher numbers twice as likely to come up as the lower ones. We are unsure whether our die is fair or loaded in that particular way, with no preference for either, so for our betting we adopt an equal mixture:
v* = v + v’ = <2,2,2, 3, 3, 3>
and now our odds for any outcome – e.g that the outcome is an odd number – will be halfway in between. For example, the odds of the outcome being ‘high’ to its being ‘low’ is half-way between what it is for the two dice, that is, one and a half ( i. e. 3/2, as you can easily see).
EXERCISE 1. A hiker is at point A and would like to get to point X. Being ignorant of the trail system he chooses at random at each juncture. From A there go 6 trails: one goes to B1, two go to B2, and three go to B3. From each of these B points, there go 3 trails. From B1, one goes to X, from B2 two go to X, and from B3 all three go to X. What are the odds that the hiker reaches X (and what is the probability that he does so)? Reason with odds.
Answer. For the model, K = {X, ~X}. At the B points the odds vectors are <1, 2>, <2, 1>, <3, 0> respectively. At A, there are different odds to reach the B points, 1:2:3. So the correct odds vector for this situation is the mixture:
1<1,2> + 2<2,1> + 3<3,0> = <14, 4>.
The odds are 14: 4, or 7:2, the probability of reaching X is 14/18 or 7/9. (Check this by also doing the exercise reasoning with probabilities.)
Truth conditions for odds statements[2]
A proposition is true in a world in model M (the world satisfies that proposition) iff that world is a member of that proposition. Notation:
M, w╞A iff w is in A
We will similarly say that an odds vector satisfies an odds statement under the appropriate conditions.
Our language has a classical sentential syntax with &, v, ~ and one special sentential operator O, numerals, and atomic sentences. I will use the same capital letters for sentences as for propositions, and K for the tautology. The sentence formed by applying connective O to A, B, in that order and the numerals x and y I will write, to be reader-friendly as O(A : B) = x : y. It is read as “the odds of A to B are x : y”, and it is called a (simple) odds statement.
I’ll use the same symbol for satisfaction, and write “v╞ E” for “ odds vector v satisfies odds statement E”. The truth conditions for simple odds statements are then, as you would expect:
M, v╞ O(A : B) = a : b if and only if a : b = Σ{v(x): x is in A} : Σ{v(x): x is in B}
EXAMPLE 2. In EXAMPLE 1, v* = <2,2,2,3,3,3>. If A = (outcome is odd) and B = (outcome is even) then
M, v╞ O(A : B) = 7 : 8.
For Σ{v*(x): x = 1, 3, 5 } : Σ{v*(x): x = 2, 4, 6 }
= (2 + 2 + 3) : (2 + 3 + 3) = 7 : 8
To complete this part we have to look at the more general concept of odds relative to a ‘given’ in general.
As I discussed in an earlier post, conditionalizing an odds vector on a proposition A consists just in assigning 0 to the elements not in A. We can make it precise the following way.
Let JA be the indicator function of A, that is, the function that gives value 1 to elements of A and value 0 to all other elements. For example, with x and y in K, if A = {x, y}, JA(x) = 1 = JA(y), and if z is neither x nor y then JA(z) = 0
Definition. (IAv)(x) = def JA(x)v(x), where v is any vector, with real number components and x is a world in K
It may be puzzling why I give this definition for vectors with negative components at the same time, although they are not odds vectors. But it will simplify things later to do that.
So IA is an operator, it operates on vectors, it is a linear operator. The most visual way to display that, in the case of finite vectors, is by representing the operator by a matrix. Suppose K = {x1, x2, x3, x4}, that v = <2, 3, 4, 5> and A = {x1, x3}. Then the matrix representation of the action of IA on v is this:
(Briefly, the rows in the matrix are vectors too. To get the first component of the new vector multiply the top row of the matrix and the old vector. And so forth. The multiplication is the inner product of the two vectors, which I will discuss properly below.)
Truth conditions for conditional odds statements
Let’s extend the language. The sentence formed by applying connective O to A, B, C in that order and the numerals x and y, I will write, to be reader-friendly as O(A : B|C) = x : y. It is read as “given C, the odds of A to B are x : y”, and it is called a conditionalodds statement.
A conditional odds statement has the following truth conditions:
M, v╞ O(A : B|C) = x : y if and only if x : y = Σ{v(x): x is in A ∩ C} : Σ{v(x): x is in B ∩ C}
which is equivalent to
M, v╞ O(A : B|C) = x : y if and only if x : y = Σ{ ICv (x): x is in A} : Σ{ ICv (x): x is in B ∩ C}
and to the more intuitive
M, v╞ O(A : B|C) = x : y if and only if ICv╞ O(A : B) = x: y
It is easily seen now that we can define the binary O, in simple odds statements, in terms of the ternary O:
Definition. ‘O(A : B) = x : y’ for ‘O(A : B|K) = x : y.
Here the ‘given’ is a tautology, so imposes no constraint.
EXAMPLE 3. We have our die loaded so as to favor the higher numbers, represented by the odds vector v = <1,1,1, 2, 2, 2>. What are the odds of throwing a 5, conditional on the outcome being an odd number?
Here C is {1, 3, 5} so ICv = <1, 0, 1, 0, 2, 0> while A and B are {5} and {1, 2, 3, 4, 6}. The odds in question are therefore 2: (1 +1) = 2:2, i.e. fifty-fifty, as they say.
EXERCISE 1. Olivia and Norman play a game: they have an urn with 35 black balls and 1 red ball. They take turns drawing without replacement, and the one to get the red ball wins. But they are interested only in getting a Superwin: get the red ball on your first draw. Norman chivalrously offers Olivia the choice to go first if she wishes. She thinks she could end the game at once with a Superwin (chance 1 in 36). But if she doesn’t then Norman will have an advantage: 1 out of 35 to get a Superwin. Would Olivia going first be to Norman’s advantage?
Answer. There are three worlds: OW (Olivia wins), NW (Norman wins), NL (Both lose). Suppose Olivia chooses to go first. About the correct odds vector v = <v(OW), v(NW), v(NL)> for this situation, we know its conditionalizations on OW and on not-OW:
IOW = <1, 0, 0> I~OW = <0, 1, 35>
v = IOWv + I~OWv
= <1, 0, 0> + <0, 1, 35>
= <1, 1, 35>
From this we can see that, even if Olivia goes first, the odds of Norman winning are the same as for her, namely 1: (1 + 35) = 1: 36.
The example that follows was in an earlier post, with a discussion about how reasoning by Bay’s’ Theorem amounts to finding the Bayes Factor, which is the number by which the prior odds are multiplied to yield the final odds. I’ll repeat a small part here to illustrate how we now see conditionalization on evidence, taken from tests with known error probabilities.
EXERCISE 2. There is a virus on your college campus, and the medical team announces that 1 in 500 students have this virus. There is a test, it has 1% false negatives, and 1% of the positive results are false positives. You are one of the students, with normal behavior, and reckon that your odds of having the virus are accordingly 1:499. You take the test and the result is positive. What are your new odds of having the virus?
Answer. Let’s say that all told there are 50,000 students and they are all tested. There are 100 students who have the virus. 99 of them test positive and 1 tests negative. There are 49,900 students who do not have the virus, and 499 of them test positive anyway. So you are one of 99+ 499 = 598 students who test positive, and only 99 of them have the virus and 499 did not. So the odds for you to have the virus is 99 : 499. Your odds for having the virus have been multiplied by 99.
That was the intuitive frequency argument, told in terms of odds. But what exactly was the manipulation of odds vectors that was involved?
There are four worlds: x1 (positive & virus), x2 (positive & no virus), x3 (negative & virus), x4 (negative & no virus). The posterior odds vector, which we can read off from the narrative, is
v = <99, 1, 499, [50,000 – 99-1-499]>
But you tested positive, so let’s conditionalize on that:
Putting mixtures and conditionalization together we can define Jeffrey Conditionalization. I call a Jeffrey shift the following operation on a probability function designed to change the value of a given proposition A to a specified number x:
(A → x)P = xP(. |A) + (1 – x)P( . |~A), where 0 ≤ x ≤ 1 and P(A) > 0 < P(~A)
Informally: while A gets the new probability x, the ratios of the probabilities of subsets of A to each other remain the same as they were, and similarly for the ratios of the probabilities of subsets of ~A.
I’ll use a similar notation (A → x : y) for the corresponding operator on odds vectors, which changes the odds of A to ~A to x : y.
Definition. (A → x : y)v = xIAv + yI~Av, with x, y non-negative
(If x = y = 0 this is a Jeffrey shift in odds only by courtesy notation.)
When we use the matrix representation it is clear how Jeffrey Conditionalization is a straightforward generalization of ordinary conditionalization.
EXAMPLE 4. Suppose K = {x1, x2, x3, x4}, that v = <2, 3, 4, 5> and A = {x1, x3}. The current odds of A to ~A are 6 : 8 or 3 : 4 or to make the generalization more obvious, 1 : (4/3). Now if you want to double the odds for ~A, instead of multiplying by 0 for the ~A worlds, multiply by 2!
and the new odds for A against ~ A are 6 : 16 or 3 : 8 or 1 : (8/3).
EXAMPLE 5. We thought we had a fair die, and so adopted odds vector v = <1,1,1,1,1,1>. Then we learned that the even outcomes A = {2, 4, 6} are twice as likely to come up as the odd numbers. So we update to the odds vector <1, 2, 1, 2, 1, 2>. What was that? It was the Jeffrey shift:
A partition in model M = <K, F, PP> is an set of mutually disjoint propositions which is exhaustive, that is, its union is K. If S = {B1, … , Bm} and P is a probability function then the law of total probability says:
P = P(B1)P(. |B1) + … + P(Bm)P(. |Bm)
The components P(. |Bj), j = 1, …, m are mutually orthogonal, by the following definition:
Definition. If P and P’ are probability functions defined on the same algebra F then P is orthogonal to P’ if and only if there is a proposition A in F such that P(A) = 1 and P’(A) = 0.
Notation: P ⊥ P’. This relation is symmetric and irreflexive.
The corresponding definition for odds vectors is:
Definition. If v and v’ are odds vectors defined on the same set K then v is orthogonal to v’ if and only if, for each member x of K, either v(x) = 0 or v’(x) = 0 or both.
Clearly two probability vectors are orthogonal iff the probability functions which they determine are orthogonal.
Using the same symbol for this relation, we note that in mathematics, there is for vectors in general there a standard definition of orthogonality:
v ⊥ v’ iff Σ {v(x)v’(x): x in K} = 0
Since the numbers in odds vectors are all non-negative, this sum equaling 0 is the case if and only if for any x in K, at least one of v(x) and v’(x) equals zero. So suppose that E ={x: v(x) = 0}. Then for v, E is certainly not true, while for v’, E is certainly true (by the definition in Note 2 above). So this corresponds exactly to the condition of orthogonality for probability functions. We can also put it a third way:
v and v’ are orthogonal exactly if there is a proposition A such that v = IAv and v’ = I~Av’
Now we also have a neater way to give, parallel to the law of total probability, the law of total odds:
v = IBv + I~Bv
If T is a partition then v = Σ{ IBv: B a member of T}
and this is an orthogonal decomposition of odds vector v.
Odds’ natural habitat in mathematics
The odds vectors are part of a finite-dimensional vector space. A vector space over the real numbers is a set of items (‘vectors’) closed under addition and scalar multiplication by real numbers. When the vectors are sequences of numbers (as they are in our context) the odds vectors are singled out by having no negative number components.
The dimensions correspond to the worlds – the worlds are the dimensions, you might say. With the worlds numbered as above, world x1 is represented by the unit vector v(x1)= <1, 0, 0, …, 0>, v(x2) = <0, 1, 0, …, 0> and so forth. The unit vector that corresponds to world x is the one a which ranks world x – or more precisely, the proposition {x} — as certain. These unit vectors are mutually orthogonal and span the space in this sense: each vector in that space is a linear combination of these unit vectors.
Propositions correspond to subspaces. If A = {x1, x2, x3} then A corresponds to the subspace [A] spanned by {v(x1), v(x2), v(x3)}. Proposition A is ranked as certain by precisely those vectors which are in [A].[3]
The operator IA is a projection operator, it is the projection on subspace [A]. If v is any vector then IAv is the vector that is exactly like v except for having 0s for worlds not in A.
So let’s make it official. The finite probability space M = <K, F, PP> has an associated vector space V(M). Most of its description is already there in the discussion above.
The Boolean algebra of propositions F has as counterpart in V(M) a Boolean algebra of subspaces of V(M). (Note well: that is not the algebra of all subspaces of V(M), which is not Boolean – I will illustrate below.)
Each proposition A in F is a set of worlds {xj, …, xk}
Definition. [A] = the subspace spanned by the vectors v(y): y in A.
Call [A] the image of A in V(M).
Notation: if X is any set of vectors, [X] is the least subspace that contains X, and we say that X spans that subspace. In the case of a unit set {v}, I’ll abbreviate [{v}] to [v].
Define the algebra of subspaces [F] to be the set of images of members of F, with the following operations:
meet: [A] ∧ [B] = [A ∩ B]
join: [A] ⊗ [B] = the least subspace that contains both [A] and [B]
orthocomplement: [A]⊥ = {v in V(M): v ⊥ v’ for all v’ in [A]}
order: [A] ≤ [B] iff A ⊆ B
First of all, the order is just set inclusion A] ≤ [B] iff [A] ⊆ [B]. Secondly, the meet is just set intersection: A] ∧ [B] = [A] ∩ [B], for the largest subspace contained in two subspaces is their intersection.
The other two operations are less obvious. [A] ⊗ [B] does not just contain [A] and [B] but also the linear combinations of vectors in [A] and vectors in [B].
Clearly [A] ⊗ [A]⊥ = [K], but the vectors that belong to neither [A] nor [A]⊥ are not to be ignored.
That [F] is isomorphic to F, though the algebra of subspaces is not Boolean
The point is, first, that [F] is indeed Boolean, isomorphic to F, but second that there are subspaces that are not images of propositions, and because of these, there are violations of the Boolean law of distributivity.
To take the second point first, let v = av(x1) + bv(x2), with both a and b positive. Since v(x1) = <1, 0, …> and v(x2) = <0, 1, 0, …> we see that v = <a, b, …>. Suppressing the extra zeroes we can picture it like this:
[v] is not an image of any proposition. The least subspace that contains v is [v] = {kv: k a real number}, the ray (one-dimensional subspace) spanned by v. Note that v is a mixture of those two unit vectors, so [v] is part of ([{x1}] ⊗ [{x2}]). Denoting the null space (subspace that contains only the null vector) as f:
In other terminology: the lattice of (all) subspaces of a vector space is a non-distributive lattice.
Why is the smaller algebra of subspaces [F] nevertheless Boolean, and isomorphic to F? The reason is that the unit vectors corresponding to worlds are all mutually orthogonal. That makes them images of propositions mutually compatible in the sense in which this word is used in quantum mechanics.[4] We need only verify:
[C] ≤ [A] ⊗ [B] iff C ⊆ A ∪ B
That is so because the right hand side is equivalent to [C] ⊆ [A B] = [A] ⊗ [B], and the order in [F] is set inclusion
[C] ≤ [A]⊥ iff C ⊆ ~A
That is so because ~ A contains precisely those worlds x such that v(x) is orthogonal to all vectors v(y) such that y is in A.
The General Reflection Principle demands that your current opinion (represented by a probability or expectation function) is within the range (convex closure) of the future opinions you foresee as possible. How does that idea look with odds?
The simplest case is obvious. Suppose the worlds are the possible outcomes of an experiment (e.g. toss of a die) and you are sure that the outcome will be one of the first three. Then your current opinion must assign 0 to the other dimensions, i.e. be in the subspace spanned by those first three corresponding unit vectors v(x1), v(x2), v(x3).
EXAMPLE 6. We are conducting an experiment with set of possible outcomes being the partition T = {B1, … , Bm}. Our current opinion for the outcome is vector v, so we know our possible posterior opinion will one of the vectors in the orthogonal decomposition {IB1v, … , IBmv). This corresponds to conditionalization in the case of probability – that the outcome of an experiment is a projection on a subspace is called the Projection Postulate in discussions of quantum mechanics.
It is a bit more complicated when you have a more arbitrary set of foreseen possible posteriors, say a set X of odds vectors of some sort. Then the principle should demand that your current opinion is represented by an odds vector that lies within the least subspace that contains X. What is that?
The answer appeals to ‘double negation’. First take the set of all vectors that are orthogonal to all members of X, which is the orthocomplement X⊥ of X. Those are the opinions certainly ruled out by the principle. Then take the orthocomplement of that: X⊥⊥.
It is a theorem that, whatever X is, X⊥⊥ is a subspace, and it is the least subspace that contains X.
The Reflection Principle then demands that your current opinion is constrained to be an odds vector that lies in that subspace.
What are called quantities elsewhere statisticians call random variables. A random variable on the model is any function that assigns a real number to each world in K. For example, K might be the days this week and function r assigns each day its precipitation amount. So a random variable r, in this case, is representable by a vector r = <r(x1), r(x2), …, r(x7)>.
Definition. The expectation value Ep(r) of r for p is Σ{ p(xj)r(xj) : j = 1, …, 7}, provided p is a probability vector.
But that is exactly the definition of the inner product (also, called scalar product) on our vector space:
Definition. The inner product of vectors v and v’ is the number (v, v’) = Σ{v(x)v’(x) : x in K}.
So the expectation value of quantity r for probability p is the inner product of their representing vectors.[5]
Since the values of the random variable, e.g. the amounts of precipitation, are absolute values on a chosen scale (e.g. inches) the expectation value is not something comparative, and there is no advantage is adapting the concept of expectation value from probability to odds.
But when this subject was originally created in the 17th century, before the technical concepts had solidified in our culture, we can read the texts as discussing the matter in terms of odds, quite naturally. (Translations tend to do so in terms of as probabilities and expectation values, that is, in terms of the concepts we mainly employ today, but I suggest that this may be anachronistic.)
For example, here is Huyghens’s Proposition III:
If I have p chances to get a, and the number of chances I have to get b is q, then (assuming always that each chance can occur equally easily): that is worth (pa + qb)/(p + q) to me. (My translation from the Dutch.)
Here p and q can be any natural numbers, say 17 and 51. The division by their sum points us to reading his text as, in effect, ‘If the probability to get a equals p/(p + q) …”. I am not saying that is wrong, I agree that if values to me are described in absolute rather than comparative terms, that is natural as well.
But think of this in a larger context:
I have 17 chances to get a, 51 chances to get b, 101chances to get c, …
You want to buy from me the chances to get a and to get b
How much do you owe me?
Three remarks:
the first line is most easily read as displaying two vectors, namely an odds vector <17. 51, 101, …> and a random variable vector <a, b, c, …>;
to calculate the fair price, reference to all the other contracts or lottery tickets that I have can be omitted,
the price must be an appropriate fraction of (a + b), with proportions of a and of b in the ratio 17 : 51, that is, 1 : 3.
So this is a way of reading the text, I think very naturally, in terms of odds thinking.
Admittedly these three remarks do not yet, taken together, yield Huyghens’ result. The gap is filled by his symmetry argument about games in the proof of his Proposition III. (See my post “Huygens’ probability theory: a love of symmetry” of April 2023.)
ENDNOTES
[1] The term “mixture” is common in physics, not in mathematics, but I like it because it is a term that aids visual imagery.
[2]I’m going to fuzz the use/mention distinction a bit from here on. As my friend Bob Meyer used to say, we are following the conventions of Principia Mathematica.
[3] Think about quantum logic. As introduced by von Neumann: subspaces are identified as the propositions for that logic. Various intuitive motivations have been offered for this.
[4] The unit vectors that correspond to worlds, in the way indicated, and which form a basis for the space, are the eigenvectors of a single observable. Propositions correspond to statements to the effect that the eigenvalues of this observable are within a certain range.
[5] Geometrically, the inner product measures the angle between the two vectors, and the inner product of a vector with itself measures its magnitude. Notation:
||v|| = square root of (v,v)
𝜙 is the angle v^v’ between vectors v and v’ iff the cosine of 𝜙 = (v, v’)/(||v||.||v||).
Equivalently, (v,v’) = ||v||.||v’||cos(v^v’).
Note that the cosine varies inversely with the angle.
Abstract. A recent paper (Ding and Holiday 2020) exhibits a fairly intuitive principle (SPLIT) and shows that it cannot be satisfied if the algebra of propositions is atomic. That is correct, but a slight liberalization of neighborhood possible worlds semantics will accommodate it. This another interesting point about infinities.
The SPLIT principle can be stated for an arbitrary modal operator Q, but it is illustrated with the modality ‘it is queried whether‘.(but see Postscript about different examples, like belief):
SPLIT. If A is true then [it is possible that A is true and is queried, and it is possible that A is true and not queried)
A → [♦(A & QA) & ♦(A & ~QA)]
The argument about atomicity is simple. An atom is a proposition which is entailed only by the impossible proposition and itself. (Using the same notation for propositions as for sentences:)
therefore, if A is an atom then either (A & QA) or (A & ~QA) is just A, and the other is the impossible proposition.
In Ding and Holliday’s standard presentation of neighborhood possible world semantics, every set of worlds is a proposition. So there will definitely be atoms: if w is a world then {w} is a proposition, and includes no propositions other than itself and the empty set.
In my two previous posts on the logic of belief I presented the neighborhood semantics in a more liberal form:
A model is an n-tuple <W, F, N, …> The worlds form a set, W. The propositions are a Boolean algebra F of subsets of W, and F includes W. The function N assigns to each world w a set of propositions N(w), called the neighborhood of w.
Can this generalization to propositional algebras (still algebras of sets) accommodate SPLIT? As Ding and Holliday show, in effect, it can do so if atom-less Boolean algebras can be fields of sets.
And that is the case by:
Stone’s Representation Theorem. Every Boolean algebra is isomorphic to a field of sets.
Note. Stone’s representation theorem is what made possible world semantics possible in the first place. Reference for a reader-friendly presentation: Theorem 6-15 (page 201) of James C. Abbott (1969) Sets, Lattices, and Boolean Algebras. Boston: Allyn and Bacon.
POSTSCRIPT
Belief would not yield a plausible instance of SPLIT , since it is generally a principle that it is not possible not to believe a tautology. [That is so in the quite classical logic of belief I presented in the previous two posts.] So if Q is ‘It is believed that’ and A is a tautology then SPLIT is violated no matter what.
The argument about atomicity does not refer to any features of the modal operator. But the assertion that SPLIT can be accommodated in an atom-less algebra of propositions, for a given modal operator, does depend on what features that operator has.
The argument about atomicity also goes through for the weaker WEAK SPLIT principle
WEAK SPLIT. If A is true and ~A is possible, then [it is possible that A is true and QA, and it is possible that A is true and not QA)
( A & ♦~A) → [♦(A & QA) & ♦(A & ~QA)]
That weaker principle is more aptly illustrated with isbelieved as modal operator, with possibility epistemic. For the argument about atomicity: we might be in world w but not know that we are in w. Then if A = {w} then A and ♦~A are both true in w. But then WEAK SPLIT is violated.
In the past I have mainly thought of the logic of belief (if not connected with probability) as one of the family of normal modal logics. Now I realize that there is a problem which that cannot accommodate, while it seems that neighborhood semantics can.
Think of the agent who believes, for each natural number n, that there are at least n stars, but also believes that the number of stars is finite. Such examples take the simple form: something is F, 1 is not F, 2 is not F, ….
We get a geometric example with intervals: I dropped a point-particle on the interval [0,1] on the real line. My first belief is that it fell in the half-open interval [1/2,1). My other relevant beliefs are that it fell in each of the intervals (1/2, 1], (1/4, 1], …, (1/2^n,1]…. My first belief has a non-empty intersection with each of the other beliefs. But the intersection of those other beliefs is [1], which is excluded by my first belief.
From any finite subset of such a family of beliefs it is not possible to deduce a contradiction but the family as a whole is not satisfiable. Goedel called this omega-inconsistency.
These examples are, to be reader friendly, generated by simple recipes, and hence amenable to an argument by mathematical induction, leading to a straight contradiction. There are examples not of that sort, I just can’t write them down. In any case, the agent may not have mastered mathematical induction.
The Problem. In the normal modal logic approach, the agent in world w believes that A if and only if A is true in all the worlds w’ which bear a certain relation R to w. If propositions are sets of worlds, and [A] is the proposition that A expresses, with B the ‘the agent believes that’ connective, this amounts to:
BA = {w’: wRw’} ⊆ [A].
But in the case of the omega-inconsistent believer {w’: wRw’} would then have to be part of every proposition in his set of beliefs. And there is no world in which all of those are true. Thus, that case, {w’: wRw’}is empty. But that is no different from a believer who believes that A & ~A.
So there is, in normal modal semantics, no way to distinguish the omega-inconsistent believer from the believer who believes a simple self-contradiction.
The Solution. Here I will rely on my previous post, about logic of belief and neighborhood semantics.
Given simply that world w has neighborhood N(w) and p is believed in w iff p is a member of w, that distinction between ‘ordinary’ and omega-inconsistency can be respected. For N(w) is a filter on the algebra of propositions, merely closed under finite intersections and superset formation.
So suppose the agent in w believes p = (something is F), q(1) = (1 is not F), q(2) = (2 is not F), …. and those propositions generate filter N(w). In that case ~p is not in N(w), so ~B~p is true in w. At the same time, there is no world in which all of N(w) is true, so the agent’s beliefs, taken altogether, imply all propositions, including B~p.
This is an adequate representation of omega-inconsistent belief, with the empty set not in N(w). That shows that condition
(cd) Λ is not in N(w)
must not be read as ‘N(w) is consistent’ but only as ‘Each finite subset of N(w) is consistent’.
The Upshot
What this means is that you can be entirely wrong about what the world is like, with ideas that are not realistic under any possible conditions, and still live a useful, productive, and happy life. And your family and friends might never find out.
1. Initial set-up: an agent’s belief as neighborhood 1
2. About true belief 2
3. Supposition: of two sorts 2
4. Belief under supposition that something is the case 3
5. Modeling belief under supposition that something is the case 3
6. Supposition and a conditional operator 4
7. Models with conditional operator 5
8. Logic of simple belief 5
9. Self-transparency (?) 5
10. Logic of conditional belief 6
11. Ubiquity of Moore’s Paradox 6
The logic of belief and the logic of belief change, as developed by various authors, has for the most part had the following focus:
the reaction of an agent who has opinions about the external world, and changes these opinions upon receipt of new information.
If logical constraints on belief change are spelled out in some way, it should also be possible to consider answers to “what I would believe if” kinds of questions. Or, to put it differently, possible to entertain a conception of conditional (relative, suppositional) belief, already present in the agent’s initial resources.
NB. I am not assuming that rational belief change must consist in ‘mobilizing’ those conditional beliefs, or that such change must be like ‘conditionalization’. That I regard as a separate question, likely to have its own complexities.
These ‘what would I believe’ questions are of two sorts, as I will explain. To accommodate both sorts, it seems best to go to neighborhood semantics.
1. Initial set-up: an agent’s belief as neighborhood
Class LB-0 of models. A model is an n-tuple <W, F, N, …> The worlds form a set, W. The propositions are a Boolean algebra F of subsets of W, and F includes W. The function N assigns to each world w a set of propositions N(w), called the neighborhood of w. The dots indicate that the model may include special operators on propositions, or relations between worlds, or the like.
In the intended interpretation for the logic of belief, a world is ‘centered’ on an agent, and the agent’s full beliefs in world w form ‘neighborhood’ N(w) of propositions.
While there are many options here for exploring more and less minimal logics, I will for now go with all of the following four conditions (from Chellas, who has different interpretations in mind):
(cm) If p ∩ q is in N(w), so are p, q
(cc) If p, q are in N(w), so is p ∩ q
(cn) W is in N(w)
(cd) Λ is not in N(w)
It follows that N(w) is a proper filter on the family of propositions. A filter is a family closed under finite intersection and superset formation. It is proper if it does not contain the empty set.[1]
Thus the agent is assumed to be consistent (but see the qualification below, about infinity).
I’ll assume to begin the syntax of sentential logic, and a standard way of assigning propositions to those sentences. The syntax may include other elements not specified as yet. If A is a sentence then [A] is the proposition that A expresses.
For propositions I use lower case letters and for sentences upper case letters.
2. About true belief
To what extent is this agent right about what things are like? His world w may be outside some propositions in N(w), and in others. The set RIGHT(w) = {p in N(w): w in p}. RIGHT(w) is not empty, since W is in it.
I think we can’t assume that the agent has such a superior intellect that N(w) is closed under infinite intersection, and so RIGHT(w) is not either. But we may observe that the infinite intersection of the members of RIGHT(w) cannot be empty, since w is a member of each member of RIGHT(w).
However, the infinite intersection of N(w) may be empty. The agent does not believe any one self-contradictory proposition, but the infinite totality of his beliefs might not be satisfiable.
For example he might believe of each natural number n that there are at least n stars, and also believe that the number of stars is finite. Finite intersections of members of N(w) will all be satisfiable, but it is not possible for all members of N(w) to be true together (in any world) . In this case RIGHT(w) will clearly lack some of those trouble-making beliefs: some must be false.
3. Supposition: of two sorts
The agent can reason under supposition, and we should make a distinction:
Q. What would I believe if I believed that A?
R. What would I believe if A were true?
To illustrate the distinction (though in connection with the logic of conditionals) Rich Thomason gave the example
[*] [My wife Sally is so clever that] if Sally were unfaithful I would not believe it
Given this, there is clearly a difference between what Thomason would believe if Sally were unfaithful, and what he would believe if he believed that Sally was unfaithful.
Discussion of question Q typically starts with this principle:
If [A] is consistent with N(w) then the answer is {[A] ∩ p: p is in N(w)}
Difficulties to be solved then concern what the answer should be if the agent believed ~ A and now receives the information that A.
Let’s set this much discussed issue aside, and consider question R.
Thomason’s example is about what beliefs he would not have, under certain conditions. It is easy enough to make up examples of the same sort, of a positive kind. Example: when Thomas turned 18 his legal parents told him finally that he had been adopted. He says
[**] If my parents had lied to me about my having been adopted, I would still believe that Herman is my father.
It is clearly not the case that if Thomas learned that his parents had lied to him about having been adopted, he would then still have the belief that his legal parents were his biological parents. But [**] could be a correct expression of a current (conditional) belief.
4. Belief under supposition that something is the case
Whether or not A is consistent with previous beliefs, the answer to “what would I believe if A” will depend on various other, relevant beliefs the agent has. And there will be a selection involved.
For example, in Thomason’s example, if A = (Sally is deceiving me), the relevant belief he cites is that Sally is so clever. That is what he is taking into account when framing his answer. But he could instead have selected other beliefs, perhaps ones that entail that Sally would not be able to live with the deceit. Then his answer would have been substantially different.
Additional selection is involved after the initial selection of (what is to the agent) the most salient belief already in place. If asked to elaborate, Thomason could say that he would then believe that Sally had developed new interests that account for her novel behavior. Or he might say, realizing his own limitations, that he would be just like Ford Maddox Ford’s good soldier, oblivious of any tensions calling for explanation. Or he might finally take into account other aspects, and add that eventually, he would come to believe it, for example because Sally would eventually tell him.
We could introduce a relationship into the models, between worlds and worlds, or worlds and propositions, that will once and for all settle how those selections are made. That could be like the ‘nearness’ relation imposed by Stalnaker and Lewis in their logics of conditionals. I think that approach resulted in an oversimplification that harms the subject, so will not take that direction.
So for now, all I want to say, is that the neighborhood of w relative to A, or conditional on A (in sense of question R), call it N(w, [A]), is another filter on the family of propositions. It is the agent’s doxastic neighborhood conditional on the supposition that A is the case.
5. Modeling belief under supposition that something is the case
Class LB-1 of models. A model is an n-tuple <W, F, N1, N2,…> in which <W, F, N1, …> is a model in Class LB-0. The function N1 assigns to each world w a set of propositions N1(w), called the neighborhood of w. The function N2assigns to each world w and each proposition p a set of propositions N2(w, p), called the neighborhood of w relative to p. Moreover:
Constraints on the models in Class LB-1:
(c0) N1 ( w ) = N2(w, W)
It guarantees that what I would believe on the supposition of a tautology is just what I believe. There are various ways in which this could be strengthened, but I’ll leave that to think about later.
The practical effect is that it will now cause no confusion if we suppress those superscripts and just write N(w), N(w, p).
(cm, cc, cn) N(w,p), is a filter on F
But not (cd): we must allow that N(w, A) is not proper. For this point I recall an example I heard from Bob Stalnaker. “If Hitler had won the war, we would be taught German in high school” seems quite reasonable. But, Bob said, if he were to learn, and came to believe, that Hitler had won the war, he would conclude that he was insane. For there would be no rationalizing how he had been living in an alternative world this long.
I think we can be less extreme, and just allow that for certain statements A, N(w,A) is an improper filter, it contains the proposition that is false in every world. Below we’ll have an example where this would be clearly indicated.
For a further example, imagine that Thomas believes both that he would catch the 9am train, and that if he catches the 9am train he will get to work on time. It must surely follow that he believes that he will get to work on time. So we must add the constraint:
(cx) If p is in N(w) then N(w, p) ⊆ N(w)
This guarantees a sort of Modus Ponens, or detachment: if the agent believes that p, and also believes q on the supposition that p, then he believes that q. This point concerns answers to questions of form Q, the special case.
All the constraints listed in this section are part of the definition of Class LB-1 of models.
6. Supposition and a conditional operator
We can consider the possibility that those conditionals, such as “If Sally were unfaithful then I would (would not) believe that …” are already present, explicitly formulated, in the original set of full beliefs.
What would that look like? To formalize a bit, let’s use the expression “>b>” homonymously for a sentential connective and the operator it expresses.
“A >b> B” is to be read as “If A, I would believe that B”
Again taking the cue from Chellas,
w is in [A] >b> [B] iff [B] is in N(w, [A])
That is at odds with the intuitive idea of arrows as symbolizing “if … then” since Modus Ponens will not be a principle governing this. For if w is in [A], it does not at all follow, as we have seen with Thomason’s example, that [A] will be in N(w,[A]). For the supposition that A is true does not imply that A is believed under those circumstances.
What about the sentence “if A, I would not believe that B”?
That will be true in w exactly if [B] is not in N(w, [A]). But that is just the negation of the truth condition for “if A, I would believe that B”. So we just symbolize it as “~[A >b> B]”. Of course that does not entail that I would believe that ~B.
7. Models with conditional operator
Class LB-2 of models. A model is an n-tuple <W, F, N1, N2,…> is a model in Class LB-0, in which, for all propositions p and q in F,
(p >b> q) = {w: q is in N(w,p}
is in F.
In the syntax we now take it that there is a non-Boolean connective, homonymously written as “>b>”, such that A >b> B is a sentence whenever A and B are sentences. In the interpretation, [A >b> B] = [A] >b> [B].
8. Logic of simple belief
Given that the neighborhood is in effect the agent’s set of full beliefs in that ‘world’ (situation, set-up), we can introduce the ‘believes that’ propositional operator.
BA is true in w exactly if [A] is in N(w)
Using B equally for the connective and for the operator it stands for, that operator is B is defined by: Bp = {w: p is in N(w)}
But B is definable because N(w) = N(w, W). So:
Definition. BA = W >b> A
Some familiar principles hold, simply because N(w) is a proper filter:
If A is valid, so is BA (for W is in N(w))
If A implies C then BA implies BC
BA, BC implies B(A & C)
BA implies ~B~A (because N(w) is a proper filter)
9. Self-transparency (?)
Believers are wrong about much, they have many false beliefs. But elsewhere I’ve taken up a classical (normal) modal logic of belief, for believers who are self-transparent. Whatever else they may be wrong about, they are exactly right about what beliefs they have.
This is a special case, a special class of models, a subclass of Class LB-2. I will not limit the discussion to this subclass, but make some remarks about it.
To represent transparency of belief let’s begin with a single case: in specific world w, the agent believes that he believes that A if and only if he believes that A. And this is also for his conditional beliefs: the agent believes, on the condition that A, that the believes E on condition that A if and only if he believes E on the condition that A.
First, so for all propositions p, if p is in N(w, q) then Bp is in N(w, q). That guarantees that
V. (BA ⊃ BBA) is valid in this class of models
In this class of models, if an agents believes something then he correctly believes that he believes that.
Secondly, if the agent believes that he believes that A, then that is correct, he does indeed believe that A. So, if Bp is in N(w, q) then so is p. This guarantees that
VI. (BBA ⊃ BA) is valid in this class of models.
V. and VI. hold also for the connective “A >b>”, for any sentence A, in the place of connective “B“.
10. Logic of conditional belief
That neighborhoods are proper filters immediately gives us:
VII.A >b> (B & C) implies [(A >b> B) and (A >b> C)]
VIII. [(A >b> B) and (A >b> C)] implies A >b> (B & C)
IX. if {E1, … En,} implies E then {A >b> E1, …, A >b> En} implies A >b> E
But this notion of conditionality lacks two salient features:
X*. A >b> A is not a valid principle
XI*. Modus Ponens, {A, A >b> B} implies B, is not a valid principle.
For suppose BA, A >b> E are true in world w. Then [A] is in N(w). By the second condition listed here, N(w, [A]) is part of N(w). That means that for any proposition q, if q is in N(w,[A] then q is N(w). Since A >b> E is true in w, [E] is in N(w, [A]). Hence [E] is in N(w), and therefore BE is true in w.
11. Ubiquity of Moore’s Paradox
Since Modus Ponens is not valid for >b>, the following triad is consistent:
(***) [Sally is not able to keep a secret] If Sally were deceiving me, I would believe that. I do not believe that Sally is deceiving me. Sally is deceiving me.
The three statements could all be true together, for the first and second just describe my beliefs, while the third describes a fact independent of anything I believe or disbelieve.
It is a virtue of having >b> in the language that we are able to represent this situation.
But of course, (***) is not something that could express a coherent agent’s belief. Principle X guarantees that, if the third statement is replaced by “I believe that Sally is deceiving me”, the result is inconsistent.
In the class of models as set up so far, an agent’s beliefs are consistent (setting aside problems with infinity). Therefore these agents are pretty safe from paradox. But what if we ask about conditional belief, with the condition being a Moorean statement?
There is no problem with N(w, [A & ~BA]) since nothing guarantees that [A & ~BA]) is in N(w, [A & ~BA]). I would naturally take that neighborhood to exclude A, for the agent in a world where [A & ~BA] is true would not know or believe that A is true, and might or might not know or believe that he does not believe that A.
But what about N(w, [BA & B~BA])? Again, not a problem with the representation in general, but this harks back to the above remarks about transparency.
The transparency conditions imply that both [A] and [~BA] will then be in N(w, [BA & B~BA]), and [~BA] = W – [BA].
Now, in this special case where the agent is self-transparent, [BA] will also be in there, because [A] is, and hence so will the empty set, the filter will be improper.
So here we have a case where the agent’s question “What would I believe if …?” forces the answer, in effect, that under those conditions his beliefs would be incoherent.
REFERENCE
Brian F.Chellas (1980) Modal Logic: An Introduction. Cambridge U Press.
[1] If p is in N(w) then the filter {q: p ⊆ q} generated by p is part of N(w), and is proper. Those filters themselves form a proper filter as well, in the family of subsets of N(w).
Stalnaker’s Thesis that the probability of a conditional is the conditional probability, of the consequent given the antecedent, ran quickly into serious trouble, in the first instance (famously) by David Lewis.
When I took issue with David Lewis’s triviality results, Robert Stalnaker wrote me a letter in 1974 (Stalnaker 1976). Stalnaker showed that my critique of Lewis did not save his Thesis when applied to his (Stalnaker’s) own logic of conditionals (logic C2).
Stalnaker proved, without relying on Lewis’ special assumptions:
If the logic of conditionals is C2, and for all statements A and B, P(A → B) = P(B|A) when defined, then there are at most two disjoint propositions with probability > 0.
At first blush this proof must raise a problem of a result I had presented, namely:
Theorem. Any antecedently given probability measure on a countable field of sets can be extended into a model structure with probability, in which Stalnaker’s Thesis holds, while the field of sets is extended into a probability algebra.
This theorem does not hold for a language of which the logic is Stalnaker’s C2. Rather, it can be presented equivalently as a result for a language that has the same syntax as C2, but has a weaker logic, that I called CE.
While Stalnaker acknowledged that his proof was specifically for C2, and did not claim that it applied to CE, neither he nor I showed then just how the difference between the two logics resolves the apparent tension.
Here I will show just how Stalnaker’s triviality argument does not hold for CE, with a simple counterexample.
2. Stalnaker’s Lemma
Stalnaker’s argument relies on C2 at the following point, stated without proof, which I will call his Lemma.
Definition. C = A v (~A & (A → ~B))
Lemma. ~C entails C → ~(A & ~B)
We may note in passing that these formulas can be simplified using principles that hold in both C2 and CE, for sentences A and B that are neither tautologies nor contradictions. Although I won’t rely on this below, let’s just note that C is then equivalent to [A v (A → ~B)] and ~C to [~A & (A → B)].
3. The CE counter-example to the Lemma
I will show that this Lemma has a counter-example in the finite partial model of CE that I constructed in the post “Probabilities of Conditionals: (1) Finite et-ups” (March 29, 2021).
The propositions are sets of possible outcomes of a tossed fair die, named just by the numbers of spots that are on the upper face. To begin we take propositions
p = {1, 3, 5} “the outcome is odd”
q = {1, 2, 3} “the outcome is low”
The probability of (p → q) will be P(q|p) = P(1, 3)/P(1, 3, 5) = 2/3. That is the clue to the construction of the selection function s(x, p) for worlds x = 1, 2, 3, 4, 5, 6.
In this model the choices are these. First of all if x is in p then s(x, p) = x. For the other three worlds we choose:
s(2, p) = 1, s(4, p) = 3, s(6, p) = 5
Thus (p → q) is true in 1 and 3, which belong to (p ∩ q), and also in 2 and 4, but not in 5 or 6.
Hence (p → q) = {1, 3, 2, 4}, “if the outcome is odd then it is low”, which has probability 2/3 as required.
Similarly we see that (p → ~q) = {5, 6}.
To test Stalnaker’s Lemma we define:
c = p ∪ (~p ∩ (p → ~q))
= {1, 3, 5} ∪ ({2,4, 6} ∩ {5, 6})
= {1,3, 5} ∪ {6}
= {1,3,5, 6} “the outcome is odd or 6” or “the outcome is neither 2 nor 4”
~c = {2, 4} “the outcome is 2 or 4” (the premise of the Lemma)
Now proposition c has four members, and that means that in the construction of the model we need to go to Stage 2. There the original 6 world model is embedded in a 60 world model, with each possible outcome x replaced by ten worlds x(1), …, x(10). These are the same as x, except that the selection function can be extended so as to evaluate new conditionals. The previously determined choices for the selection function carry over. For example, s(4(i), p) = 3(i), so (p → q) is true in each world 4(i), for i = 1, …, 10.
We refer to the set {x(1), …, x(10)} as [x]. So in this stage,
c = [1] ∪ [3] ∪ [5] ∪ [6]
The conclusion of the Lemma is:
c → ~(p ∩ ~q} = c → ~[([1] ∪ [3] ∪ [5]) ∩ ([4] ∪ [5] ∪ [6])]
= c → ~[5] “If the outcome is either odd or 6 then it is not 5”
What must s(x, c) be? The way to determine that is to realize again that each member of c must have probability ¼ conditional on c. Probability ¼ equals 15/60 so for example (c → {1}) must have 15 members.
Since [1] is part of c, we must set s(1(1), c) = 1(1), and so forth, through s(1(10), c) = 1(10). Similarly for the other members of c.
To finish the construction we need to get up to 15, so we must choose five worlds y not in [1] such that s(y, c) = 1. Similarly for the rest. To do so is fairly straightforward, because we can divide up the members of [2] and [4] into four bunches of five worlds each:
S(2(i), c) = 1(i) for i = 1, .., 5
S(2(j), c) = 3 (j) for j = 6, .., 10
S(4(i), c) = 5(i) for i = 1, .., 5
S(4(j), c) = 6 (j) for j = 6, .., 10
Now each conditional c → [x] is defined for each of the 60 worlds, and has probability ¼ for x = 1, 3, 5, 6.
The Lemma now amounts to this, in this model:
~c implies c → ~{[5]}
or, explicitly,
[2] ∪ [4] ⊑ [[1] ∪ [3] ∪ [5] ∪ [6]] → ~[5]
For a counter-example we look at a specific world in which ~c is true, namely world 4(1). Above we see that s(4(1), c) = 5(1). Therefore in that world the conditional c → {5(1)} is true, and hence also c → [5], which is contrary to the conclusion of the Lemma.
4. Conclusion
To recap: in this finite partial model of CE the examined instance of Stalnaker’s Lemma amounts to:
Premise. The outcome is either 2 or 4
Conclusion. If the outcome is neither 2 nor 4 then it is not 5 either
And the counter-example is that in this tossed coin model, there is a certain world in which the outcome is 4, but the relevant true conditional there is that if the outcome is not 2 or 4 then it is 5.
Of course, given that the Lemma holds in C2, this partial model of CE is not a counter-example to Stalnaker’s argument as it applies to his logic C2 or its extensions. It just removes the apparent threat to CE.
REFERENCES
Stalnaker, Robert (1976) “Stalnaker to van Fraassen”. Pp. 302-306 in W. L. Harper and C. A. Hooker (eds.) Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science. Dordrecht: Reidel.
APPENDIX. The ‘Orthodox’ Representation of Vague Opinion p. 5
This puzzle was devised by Roger White (2010: 175ff.) in support of an argument against the very idea of vague probability judgements. (See e.g. Topey 2012 for discussion.)
To begin I will take up the puzzle itself, in its general form, as it applies also to precise opinion, and the fallacy it tends to evoke. Then I’ll discuss decisions under vague uncertainty, and end with an Appendix for open problems of a technical sort.
1. A Version Of The Puzzle
Time 0. Jack has a coin that you know to be fair. There is a certain proposition p about which you are uncertain (in one way or another), but you know that Jack knows whether p. Jack paints the coin so that you can’t see which side is Heads and which side is Tails, then writes ‘p’ on one side and ‘~p’ on the other. Jake tells you that he has placed whichever is true on the Heads side, and its contradictory on the Tails side. Jake will toss the coin so that you can see how it lands.
Time 1. Jack tosses the coin, and you see that it has landed with the side marked ‘p’ facing up
What does this do to your opinion about how likely it is that p is true?
Now we may be inclined to reason as follows:
[ARG] “This coin is fair, so the probability is 0.5 that it landed Heads up. But given that p is showing, p is true iff the coin landed Heads up. Therefore the probability that p is true is 0.5.”
Notice that it does not matter what p is, except that you are uncertain about it. Also note that your prior probability for p (whether precise or vague) makes no difference, to what your posterior probability becomes.
Notice also that if you had seen that the coin had landed showing ~p, you would have come to the posterior probability 0.5 for ~p, and hence also for p,by an exactly similar argument. Therefore, it is predictable beforehand that your posterior probability for p is 0.5, regardless of which proposition p is, and regardless of your prior probability for it. As soon as Jake has told you what he is going to do, if you believe you will look at the coin when it has landed, you know how likely you will take p to be at the end.
White dismisses this argument, with the words “But this can’t be right. If you really know this in advance of the toss, why should you wait for the toss in order to set your credence in p to 1 /2?”.
And dismiss it he should! For form of argument [ARG] quickly leads to total incoherence.
EXAMPLE
Jake has three cousins, called Jack, Jim, and Jules. They approach Mark, offering the same procedure as Jake’s, but for specific propositions. They have looked at die, and have recorded which face is up. Jack tells Mark he has a fair coin, and will write “The face up was either 1 or 2” on the Heads side if that was true, and on the Tails side otherwise, with the negation on the other side. Then he will toss that coin and Mark can see the result.
Mark remembers the entire discussion following Jake’s procedure, he accepts [ARG] as the proper reasoning, and so concludes that after seeing the result, whatever it is, he will have probability 0.5 that the outcome was either 1 or 2.
Jim now gets into the act, in precisely the same way, with the proposition “The outcome was 3, or 4”. Then Jules, of course, for the proposition “The outcome was 5 or 6”. Each is referring to the same coin record as Jack.
After they are done Mark has probability 0.5 that the outcome was either 1 or 2, and 0.5 that it was 3 or 4, and 0.5 that it was 5 or 6. So his probability that the outcome was 1, 2, 3, 4, 5, or 6 is now 1.5. His opinion is now completely incoherent.
To press the point home: a little theorem
It takes just a couple of lines to prove that for any probability function P and any propositions p and q in its domain, P(p) is between P(p|q) and P(p|~q), when defined.
So applications of [ARG] will be invalid whenever P(p) is a (sharp) probability other than 0.5.
For example, suppose p = (BvF won the lottery), and for me, p has a probability of less than a million (as it does). Then there does not existany proposition q such that I must assign 0.5 to p, by conditionalization, both when my evidence is q and when it is ~q.
2. Diagnosis
Argument [ARG] is spurious.
When Jake sets his procedure in motion, the question I must ask myself is this:
when Jake goes to place p on one side of the coin, how likely is he to place it on the Heads side?
Well, he will do so only if p is true. And how likely is that?
Suppose I bought one ticket in the lottery and Jake has checked whether it was the winning ticket. For p he selects You have won a million dollars.
Well, how likely is it that p is true?
For me, it has probability less than one in a million. So if I see that sentence on top, I say: this coin landed Heads up on this particular occasion only if Jake wrote p on the Heads side. And this he did only if I turned out have had the winning ticket. So the probability that the coin landed on the Heads side, on this particular occasion, is the probability that I won a million dollars, which is less than one in a million
Landed Heads implies I won a million. So Prob(Landed Heads) ≤ Prob(I won a million)
This does not deny for a moment that the coin is fair, and that it certainly was the case that the probability was 0.5 that the coin would land Heads up on that particular toss. But now that the coin is lying there, we have to go with what we know about Jake’s procedure.
3. What About Vague Probabilities?
Let’s first discuss vague probability taken as a general subject, setting aside for now any questions about the ‘orthodox’ representation of vague opinion (which is by means of families of probability functions).
Suppose then that I have no precise opinion at all, let alone sharp probabilities, for proposition p. In that case, when I see p displayed on top of the coin, I can’t reason with myself about how likely Jake was to place p on the Heads side of the coin. Thus what Jake has told me about how he would proceed, depending on whether p is true, has given me no usable information at all. There is nothing for me to process.
So I am at a loss, in the extreme case where I have no opinion at all. But what that sort of case can be is not easy to grasp, and I will give a concrete example below.
Vague opinion is not usually so totally vague as all that. In a more practical case, e.g. that the weatherman’s forecast was that it will rain tomorrow, I do have some opinion. For example, I may say that this is at least as likely as not. That is, my probability is at least 0.5, or equivalently (if we want to put it that way) the interval [0.5, 1].
What if I am offered a bet on this, with prize 1 utile? There is one highly conservative policy I could follow: if buying the bet, pay no more than 0.2, if selling take no less than 1. As to any other offer, just say no.
Well, that is fine with such a cozy bet on an innocuous subject, but what if a great deal depends on it? What if, in William James’ terms, the choice is forced, so that not betting is itself a choice with possibly awful consequences? To jump the chasm may cost you your life or it may save you, but if you do not jump you are almost certain to suffer debilitating exposure.
The other, highly permissive policy is to say: if you want, buy the bet at any price between 0.5 and 1, inclusive. None of these choices has anything to favor it over the others, but each has the merit that you may prefer it to inaction, although you cannot calculate a higher expectation value.
THE GAMBLE = AN OPINION UPDATE?
Suppose that in the above illustration I am offered a bet on it will rain tomorrow, with payoff 1if true (and 0 if false), for 0.6 utiles. Suppose I buy the bet. Am I irrational?
If that is irrational, then we are all irrational all the time, when we go into stores and buy things.
What did I do?
(I) Taking into account all the information I have, and judging it at least as likely as not that it will rain tomorrow, though not above nine times as likely as not, I know that I take a risk by buying the bet for 0.6, a risk that I cannot quantify.
Now, there is a longstanding idea that my opinion is whatever it is that is exhibited in my willingness to bet. If we apply that idea here, directly and uncritically, we arrive at:
(II) The act of betting 0.6 for a 1/0 option on rain tomorrow, at that point, shows that I have just updated my probability for rain tomorrow to the sharp probability 0.6.
Plausible in view of the tradition, concerning credence or subjective probability, that we are all part of, certainly. But (II) contradicts (I). For if (II) is correct, then the agent, me, has quantified the risk.
(I) says in effect that I am not changing my opinion about rain tomorrow at all. Rather, my opinion does not suffice to determine my decision. Note that there was clearly no opinion updating going on, for between the formulation of my opinion and the offer of the bet there was no new information to update on!
To show what my opinion is, I will continue to counsel anyone who asks that I can say no better than that the probability of rain tomorrow is at least 0.5. Then they can decide for themselves whether to take a risk with bets that cost more than 0.5, or not.
To me this is common sense.
A concrete example of ‘no opinion at all’
Roger White has an objection to (I), arguing that the permissive policy would lead to financial ruin. The policy would permit you to bet the same 0.6 each time, which would ignore all that is left open by that vague opinion. Although we do not know this, the chance might in each case be 0.5, while the agent keeps buying the bet for 0.6.
But this just ignores learning. Even an uneducated but reasonable gambler will keep lowering his bets if he is consistently losing. To be more concrete, since such a repetition of chances of rain is not plausible, suppose that the Jake puzzle example has been set up with proposition p identified so as to make sure of our total ignorance.
An experiment has been set up with a coin of unknown bias, it is tossed, and p is the proposition that it landed Heads up. Then Jake, who knows the result, continues the process with his fair coin, as in the puzzle.
What does it mean that the first coin is a coin with unknown bias? The probability that this coin lands Heads up is x, and x could equally be any number in [0,1]. Well, what is “equally”? What is it for x to be a random selection from [0,1]? There are different answers we could give here, but let’s take this one: for any two sub-intervals of [0,1] that are of equal length, the probability that x belongs to them is the same.
Then Jake’s procedure is in effect a two-coin process with unknown bias in the smaller interval [0, 0.5].
On the liberal policy if I am now asked to bet on whether both coins landed Heads up on a specific occasion, I could for example choose to buy the bet for 0.2. White’s argument implies that this liberal policy permits me to make that same choice each time if the experiment is endlessly repeated, and that this strategy would lead to financial ruin with certainty.
Is that so?
If the experiment is repeated, there are two possibilities that will merit the “unknown bias” label. First, it may be repeated each time with the same coin (or coin with the same bias). Second, the choice of bias in the tossed coin may be randomized as well.
In the first case, if the real bias is below 0.2 then I will lose more often than by chance. White ignores the information gained from this: in fact the results will allow me to learn, to modify my betting behavior, so as to converge on the real bias, whereafter I will not be consistently losing. If on the other hand the real bias is above 0.2 then I am making money! More power to me.
The second case is not so different, for to make this precise we must again specify what the randomness, in the successive choices of coins, amounts to. And depending on what it is, there will typically be in effect an average bias. The gambler can learn from the results, and depending on the gains or losses may be consistently lowering his bets, or else, be happily raking in the money!
But we still have a question. What can updating vague opinion be like, in a case where there is genuine new information? Nothing in the above discussion touches that question as yet.
There is more than one answer in the literature, I will mention some in the NOTES. White targets the ‘orthodox’ probabilistic representation of vague opinion (“mushy credence”), so let us look at that. But since phenomena are all and theories are creatures of the imagination only, I am isolating the technical questions from the general discussion.
4. APPENDIX. The ‘Orthodox’ Representation Of Vague Opinion
Take it that the agent’s opinion is stored as a coherent set S of judgments of the following forms:
P(p) ≤ x, P(p) ≥ y
with p belonging to a specific Boolean algebra, the domain of P. That will in effect include P(p) = x, when S includes both P(p) ≤ x and P(p) ≥ x.
The agent’s representor is the set of all probability functions on that algebra which satisfy all members of S. So for example, if the agent’s opinion is that rain is as likely as not, then all the members of the representor assign 0.5 to rain.
As an example to illustrate the main difficulty, suppose that p and q are logically independent propositions, and that the agent judges that each of them is as likely as not.
For example, p = it will rain tomorrow and q = I mislaid my hat.
Now the agent gets evidence q.
The orthodox recipe for updating is this: replace each member by its conditionalization on q if that is well-defined, and eliminate that member if not.
What is the result? Well, for each number y in [0,1] there is a function Q belonging to this representor S such that Q(p|q) = y. So after this updating, there is for each number y in [0, 1] a function in the posterior representor which assigns y to p. So after updating, the opinion about rain tomorrow, which was entirely irrelevant to my mislaying my hat, is now totally vague.
Updating in this way is debilitatingly destructive.
Two options
The above result, with examples, and surrounded by both informal and technical discussions, was in the literature well before Roger White’s paper.
The first idea we can try out is that we could prevent this disaster by putting constraints on the representor, by additions to state of opinion S. We can add judgments of expectation value, rather than just probability, and these allow us to add judgments of conditional probability. But the problem recurs at that level, any fix that remains with linear relations does not suffice. We’d have to add non-linear constraints, in some way, for independence and correlation are not expressible in any other way.
Anyone have suggestions? Constructive attempts to find a better representation of vague opinion?
The second idea is that it is conditionalization that is at fault, and that indeed the fault lies with the idea that the representor is to be updated point-wise. Updating the representor needs to be a holistic action, an action that preserves certain important structure of the representor as a whole.
How can we think about this? The representor is a convex structure: for if P(p) ≤ x and P’(p) ≤ x then so does any convex combination of P and P’. (Similarly for expectation value constraints.)
That suggests looking at the theory of convex structures taken as wholes. Anyone have suggestions?
NOTES
Originally I subscribed to the ‘orthodox’ representation of vague probability, with conditionalization as updating method. But looking at the dilation effect (cf. Seidenfeld and Wasserman 1993) I found that it ran into the trouble with conditionalization that I described above (see my papers listed below).
I mentioned that we could look into non-linear constraints on the representor. Difficult probably, but there is a study by Halpern, Fagin, and Megiddo (1990) that could be a resource for this idea.
As I said above, there are different answers in the literature, for questions about how to represent vague probability. One that is quite different from the ‘orthodox’ way is by Fagin and Halpern (1991).
For the weakness of arguments for updating pointwise by conditionalization, and the possibility of alternatives, the place to begin is Grove and Halpern (1998).
As to the different policies for decision making under vague uncertainty, an important technical discussion is by Teddy Seidenfeld (2004). Isaac Levi’s concept of E-admissibility is a candidate for the precise form of the liberal policy. Levi himself is easier to read. The quickest introduction though is section 3 of Seidenfeld’s retrospective on Levi’s work.
REFERENCES
Fagin, R. J. Y. Halpern, and N. Megiddo (1990) “A Logic for Reasoning about Probabilities”. Information and Computation87.1,2: 78-128.
Fagin, R. and J. Y. Halpern (1991) “Uncertainty, belief, and probability”. Computational Intelligence7: 160-173
Seidenfeld, T. (2004) “A contrast between two decision rules for use with (convex) sets of probabilities”. Synthese 140:69-88.
Seidenfeld, T., and Wasserman, L. (1993). “Dilation for Sets of Probabilities”. Annals of Statistics 21: 1139-54.
Topey, Brett (2012) “Coin flips, credences, and the Reflection Principle”. Analysis 72: 478-488.
van Fraassen, Bas C. (2005) “Conditionalizing on violated Bell’s inequalities”. Analysis 65.1: 27-32.
van Fraassen, Bas C. (2006) “Vague Expectation Loss”. Philosophical Studies127: 483–491.
White, Roger (2010) “Evidential symmetry and mushy credence”. In Oxford Studies in Epistemology, Vol 3. Ed. T. S. Gendler and J. Hawthorne, 161-188. New York: Oxford U Press.
1. Sellars’ fundamental project and apocalyptic vision 1
2. Sellars’ Dialectic: Two Stages 3
3. The Thing-Kind Framework, for the Representation Of Nature 4
4. Sellars’ turn to pragmatics 6
5. Sellars: induction as rule formation 7
6. Analyticity and Inductive Risk 8
Wilfrid Sellars was our first, and perhaps so far only, master of dialectical writing since Bradley. His many students and admirers have written along lines learned from his writings, each in their own way. I’ll do so here, without apology or attempt to preempt objections, in the hope that this can have a place in the continuum of ways of understanding Sellars.
I am not a disciple, but I took his courses and seminars in Pittsburgh in the sixties and have never ceased to be fascinated by him.
Sellars’ ultimate concern consists of a vision and a project. Behind all he writes there is his vision of the end of days, of the perfection of science, at the end of the Peircean long run. And his project is to reconcile that vision with the way we in fact conceive of ourselves.
When Peirce writes about truth he requires a concordance of a statement “with the ideal limit towards which endless investigation would tend to bring scientific belief”. Sellars is acutely conscious of how language changes in tandem with changes in the accepted scientific theories. Our statements will not be statements in the language of a future, let alone final, science. Indeed (as we will see better below) the most important aspect of an ideally improving science is its endlessly improving language. But as with Peirce, we can best portray the view of how it all should be, by thinking about that ideal limit, however unrealistic.
No Reduction. Sellars was wont to say that it was he, not Feyerabend, who was the revolutionary. [Note 1] Unlike Feyerabend, Sellars would say, he was not a gradualist. Our language in use is deficient, and it does change in response to new developments in the sciences. But we cannot replace or reduce our language in a piecemeal fashion. There is no reduction of language about thought and feeling to language about the body, for example. A completely adequate language could not be the end result of a series of reductions of common sense notions, not even with common sense evolving in the process.
One of my colleagues in Princeton, Mark Johnston, once remarked “The great lesson of 20th century philosophy was that nothing reduces to anything else!” On this point Sellars was prophetic, he saw that from the beginning.
Final Descriptive Language. But the ideal end of our rational progress will nevertheless be that our descriptive language is the factual language of the final science of everything. This language will be extensional, and entirely devoid of modalities, or of any elements of what Strawson had called personlanguage.
How can these two seemingly conflicting views be placed in harmony?
The Twin ‘Shall’ Language. This descriptive language will not be all there is to rational discourse! Indeed, we could not do with only that, neither in science nor in life. The ideal descriptive language has a twin language, so to speak, for discourse in which we express our individual and communal intentions. When we say “The desert will bloom” we are making a prediction, in factual descriptive language. But something different happens when we say “The desert shall bloom!” (with our without exclamation mark). With this form of words we express our intention to act, we voice a decision. This twin language will and must always remain integral to scientific practice as well. [Note 2]
What About Our Current Discourse? Meanwhile, living in these pre-apocalyptic times, Sellars’ project is to do justice to our actual discourse about the universe and ourselves. Our current everyday language, both practical and scientific, is shot through and through with modalities, it is intensional, even hyper-intensional, and it does not admit of a faithful translation into extensional descriptive language.
His project is to explain our practice in a way that will not make it a counterexample to his vision. It would be diminishing, and grossly unfair, to just mark this with Berkeley’s “think with the learned and speak with the vulgar” [Note 3]. Sellars’ project is on grand scale: we are to understand ourselves and our own place in the world by seeing it from the perspective of that ideal end of days, from the imagined perspective of that envisioned ideal end. And step one will be to appreciate the radical disparity, and irreconcilability, of the envisioned us of then with the actual us of now.
Two ‘Mistakes’. There are two great mistakes which Sellars wants to persuade us not to make. The first is to think that, because we will always need the “shall” language to express our intentions, the ultimate description of what we do will inevitably involve such concepts as intention. No, in Sellars’ vision, the description of what we are doing, when we express intentions, will in the ideal end be purely and entirely ‘physicalist’, just like the description of anything else. The second mistake would be to infer that the factual descriptive language will need to have in it a translation or reduction of the intensional discourse of our current practical and scientific life. No, it will replace that discourse entirely.
Our versus Their description of ‘Shall’. Our description today of what will happen then is that those persons at the end of time will be expressing intentions when they engage in the “shall” language. Yes, indeed (speaking today’s language, I say) that is what they will be doing. But that is not what those persons will say when they describe what they are doing. “Intend” is a word in person-language, and they will not have any such word available for description. We should also understand that what follows the “shall”, when they speak, will not have the content of what we would say today. They will never say something that would translate into our “Romeo shall marry Juliet”, only things like “Romeo shall at all times stay in close spatial proximity to Juliet”, or the like.
Rorty And Churchland. Some, in those days, shared Sellars’ view. It was the young Richard Rorty who defended this position as eliminative materialism (1970) and the young Paul Churchland who sought to make it palatable:
“It is important for us to appreciate, if only dimly, the extent of the perceptual transformation here envisaged. These people do not sit on the beach and listen to the steady roar of the pounding surf. They sit on the beach and listen to the aperiodic atmospheric compression waves produced as the coherent energy of the ocean waves is audibly redistributed in the chaotic turbulence of the shallows.” (Churchland 1979: 29)
Specifically, Churchland argued that in person-language there is a sentence structure, used especially for the description of thought, volition, and feeling, that will disappear. It is the pattern “So and so X-es that p”, a pattern that is a hallmark of intensional language. His argument for this is that the basic sentence form in modern scientific theories is “Quantity Q, pertaining to So-and-so, has value x” which does not have the form to express a propositional attitude.
That is not a strong argument. First, the sentence form is not a great obstacle in itself, since propositions can be represented by two-valued quantities. In any case we cannot remove from science such statements as: the probability that a given measurement will have outcome y equals x. Even if this can be put in the different form, for example, ‘the quantity p(m, y) = x’ it is still the case that probability is a modality. But even if Churchland’s arguments are not strong, his conclusion depicts Sellars’ vision of the end of days.
I am not endorsing Sellars’ vision, and so, not his project.
But there is something interesting, tantalizing even, about how Sellars went about that project, of ‘doing justice’ to what he thought would ultimately be eliminated.
In philosophy of science this included a certain take on causal discourse, natural and physical modalities, laws of nature, and the counterfactuals that laws support. This comes in two stages: the first stage is a realist account and in the second stage the first is aufgehoben in a way that Sellars saw as the completion of Kant’s ‘Copernican revolution’. The realist account is presented in modal discourse, and the retrenchment in the second stage shows how realism about modalities disappears upon analysis.
Philosophy of Science in a new key. Sellars wrote “Counterfactuals, Dispositions, and the Causal Modalities” in a reaction against the puzzles about counterfactual conditionals that had been made salient by Nelson Goodman (1947) and Roderick Chisholm (1953). The topic was not new, they were related to issues that Reichenbach and Carnap had raised in philosophy of science. Sellars wanted to show that those puzzles could be dissolved or dismissed as far as science was concerned, though they pointed at more important, deeper issues there.
In a different post I have presented an analysis of Sellars’ theory of conditionals in that paper. So I will here mention only the minimum; what is important here is how the second stage, the ‘Copernican revolution’ is implemented.
A conditional is an “if … then” statement. Sellars favored the old-fashioned form “If this match be struck …”, where “be” is meant to be neutral between “is” and “were”. Traditionally, conditionals are closely linked to arguments, as exemplified in the Great Law of Implication:
But now it had become clear that in natural language, conditionals do not behave accordingly.
Sellars echoed the typical example:
the assertion that if the match be struck it will light does not commit us to the assertion that if it be struck and made wet, it will light
Symbolize the disputed argument as “If S→L then (S & W) → L”. If the Great Law of Implication held for this sort of example, this argument would be valid:
Suppose S → L, so then S implies L. But if S implies L, so does (S & W). Therefore (S & W) implies L, and thus, (S & W) → L.
Thing-Kind Language. At first sight, Sellars grants that these puzzles about matches and light bulbs affect the language of science in practice. They share their form with statements of importance there. They belong to a class of conditionals that appear within “the conceptual framework in terms of which we speak of what things do when actedupon in certain ways in certain kinds of circumstance” (1958: 225).
And if it were 𝜙-ed and did 𝜓, and we were asked “Why did it 𝜓?” we would answer, “Because it was 𝜙-ed”; and if we were then asked, “Why did it 𝜓 when 𝜙-ed?” we would answer “Because it is a K.” If it were then pointed out that Ks don’t always 𝜓 when 𝜙-ed, we should counter with “They do if they are in C, as this one was.” (ibid., 248.)
But the crucial point is that the antecedent is an input (action or interaction) statement, the consequent an outputstatement, and neither input nor output statements describe circumstances (standing conditions). As long as we are dealing realistically with conditionals as they appear naturally in this ‘thing-kind’ discourse, we do not run into the logical puzzles Chisholm and Goodman raised.
Fine, but this thing-kind discourse is intensional. It has the modal word “causes” and allows us to make assertions not just to state the facts but to say what would have been the case if other possibilities had been realized. This is the discourse of realism about modalities in nature.
Looking at it from the perspective of the ideal end of days, where the language of science is extensional and that is the only form of description, how is this thing-kind discourse to be understood?
This is Sellars’ version of natural philosophy. It is at the same time the counterpart in Sellars of Kant’s Inaugural Dissertation, before his Critical Period. Like Kant, Sellars means to go on from that stage in philosophy to a point of view where it is not truths about nature but truths about our representation of nature that are at stake. Before that move can have some content, however, we have to have a clear idea of the earlier ‘realist’ stage.
General Form. Representation of a thing-kind K. There is a logical space (phase space, state-space) H, and family Q of quantities that characterize the states. Conditions (boundary conditions, circumstances) are something separate. There is a family G of possible actions. If 𝜋 is an action, it will (under given conditions) send the current state into a new state, or into a set of states with different probabilities. Hence there is a set R of transition functions: for each action 𝜋 a there is a function that takes any state (plus condition), into a state or set of states, possibly with an assigned probability.
This formulation allows for an indeterministic thing-kind. For the deterministic kind, the transition functions are from state (+ condition) to state.
Note that this representation is from the God’s eye point of view. We may know a great deal about thing-kind K without having all the details about the transition functions.
In his earlier paper, “Concepts as involving laws and inconceivable without them” (1948) Sellars goes further in his explication of this form of representation of nature.
A given thing x of kind K will have a history, which is a trajectory in its state-space. What is characteristic of kind K is not only that specific state-space, but a selection from the set of possible histories: the family of histories such a thing can possibly have, the thing’s possible histories.
In speaking of a family of possible worlds, what are we to understand by a “world”? Let us begin with the following: A world is a spatio-temporal structure of atomic states of affairs which exhibits uniformities of the sort we have in mind when we speak of the laws of nature. (1948: 293)
This passage he immediately follows with the admonition to abandon the term “world”, and to speak of possible histories instead:
Our basic framework is thus a family of possible histories, one of which is the actual … history. (ibid.)
Turning then to the language in which systems of kind K are described, Sellars enters the assumption that it makes sense to speak of truth and falsity with respect to any given possible history. And so it will make sense also to say of a given statement about things x of kind K that “x will 𝜓, if it is 𝜙-ed” is true in all possible histories.
Laws. And this statement, though it can be a universal truth about things of kind K, and is not just about what happens in the actual world, is still not a logical truth. The reason it is not, is precisely that kind K is characterized by a restricted family of possible histories: the histories which alone are possible for things of kind K. And this is the sense that Sellars can give to the idea of necessity in nature or of a law of nature:
A natural law is a universal proposition, implicative in form, which holds of all histories of a family of possible histories; as such it is distinguished from ‘accidental’ formal implications which hold of one or more of possible histories of the family, but do not hold of all. (1948: 309, italics in original)
These laws in conditional form are not the subjunctive or counterfactual conditionals discussed at the outset. They are universal material conditionals about the possible histories. But what holds in all possible cases is precisely what we call necessary. So we are in the land of C. I. Lewis’ strict conditional, where “if A then B” can be read as “Necessarily, either not A or B”. However, though not themselves in the subjunctive mood, these laws form the background structure which provides warrant for such assertions as “x, of kind K, will (would) 𝜓 if it be 𝜙-ed”.
it is in pure pragmatics … that the lingering ghost of naïve realism (as a philosophical perspective) is finally exorcized, and Kant’s Copernican revolution receives its non-psychologistic fruition. (Pure Pragmatics and Epistemology page 185)
Sellars makes a sort-of-Kantian move, by which he can shift, apparently effortlessly, from the representation of nature to pragmatics, the representation of our discourse in terms of use, by us, in practical situations. That is a shift in the philosophical discussion from ontology to methodology, to use a phrase he assimilated from Carnap. It was a theme for Sellars from the beginning: it was in his 1947 paper “Pure Pragmatics and Epistemology”, Sellars’ first published paper, that he described this sort of move as the way to complete Kant’s ‘Copernican revolution’.
Looking back at the main passage about conditionals that I quoted above, notice that Sellars is not concerned there with questions about the truth or falsity of the conditionals. Instead it is all about what we would, or would not assert, what we would answer if we were asked certain questions in given circumstances. And those circumstances are described in terms of what we would have reasons to believe.
Thus presented, this subject of counterfactual conditionals, is placed within practical reasoning, the subject that in the end will be addressed by the “Shall” language rather than the descriptive language. And indeed, Sellars indicates, with a clearly waved red flag, so to speak, that this will be his line. For he proposes a resolution of the traditional controversies in which
‘the core truth of Hume’s philosophy of causation” is combined with the “ungrudging recognition of those features of causal discourse as a mode of rational discourse on which the ‘metaphysical rationalists’ laid such stress but also mis-assimilated to describing (1958: 285, my italics).
The use of the descriptive language is to state facts, while the use of the twin “Shall” language is to express intentions. The grammar functions as an indirect conveyance that the conditional
“is accepted on inductive grounds … The statement, in short, sticks its neck out. It is this neck-sticking-out-ness . . . which finds its expression in the subjunctive mood.” (1958: 268-69.).
Acceptance is a pragmatic, not a semantic concept. To stick your neck out, that means to willingly and intentionally do something that is risky, to place yourself at risk. Induction, accepting something on inductive grounds, is not an inference but a decision.
So ”the core truth of Hume’s philosophy of causation” will appear when we appreciate that causal discourse and its associated counterfactual conditionals form a mode of rational discourse that is mis-assimilated to describing. This discourse involves an ineliminable pragmatic element into the understanding of that mode of discourse.
In recent literature on conditionals the line pursued by Justin Khoo is that they encode inferential dispositions. That may be a way to say what Sellars meant. The term “disposition” is then not apt, however. Given Sellars’ vision of the combination of factual descriptions solely with the “shall” language, in which intentions are expressed, we should say that we form, not inferential dispositions, but inferential commitments.
Laws Between Mere Fact And Analytic Truth. What Sellars calls the rationalist understanding of this discourse, is that it is a description of modalities, and specifically of entailment, in nature. This is much clearer in that earlier paper, “Concepts as involving laws and inconceivable without them” (1948). He begins with a dilemma posed by C. I. Lewis.
Lewis’ Dilemma. Consider a putative law like that lead melts at 621oF. We cannot construe its being a law as simply a universal material conditional, saying only that all actual or real lead samples which are heated to at least 621oF, and only those, melt. But, says Lewis, the logically stronger statement that all possible, or thinkable, lead samples are thus, can only be true if it is analytic. And the law that lead melts at 621oF is not analytic, it is not a logical truth.
Today one might well retort that statements about the possible can be true without being analytic, and some might even say that it is a contingent matter which possible worlds there are at all. That would have surprised C. I. Lewis, and Sellars as well. Both hold that what is true depends, and depends only, on what there is in the actual, real world.
C. I. Lewis’ solution is that there must therefore be a real modal connection in nature, an implication which is neither material nor logical. And we arrive at knowledge of these connections by some form of induction (more or less what we would now call inference to the best explanation).
From Induction To Conceptual Truth. Sellars counters this with a different understanding of induction, or ampliative inference in general. In fact, these terms are already prejudicial, for they come with the picture of something that is like, even if lacking certainty, logical inference, from given information to concluding statements. On Sellars’ view the process of going beyond the data is nothing like that, it is not that we arrive at new statements of fact but rather that we talk ourselves into new rules to reason by.
(Nota bene: this turn of phrase, “talking ourselves into”, signals at once that there is no question of recipes or rules that could constitute an inductive logic, in the sense of a method leading from premises to rationally compelled conclusions. Talking ourselves into something involves taking up options which have alternatives that we need to choose between, and hence, none of which are rationally compelled.)
Briefly and crudely put, we say of a piece of lead that it would melt if it were heated to 621oF, to convey that we believe the factual truth that all pieces of lead will either melt or not be at a temperature below that, and that we SHALL follow a corresponding default inference rule, ready to apply to any lead sample we may ever come across. Our assertion “Lead melts at 621oF” has a dual character, it is both description and expression.
It is precisely here that scientific practice involves the twin language of “shall” versus “will”, the language used to express intentions and decisions. That we are going to follow such a default rule is a matter of decision, it is introduced by the emphatic “I shall …” or “we shall”.
Sellars’ title conveys the main point: the concepts that we have and apply in our descriptive language cannot be understood separately from these rules by which we reason with them. Here we have come to an aspect of his view that will not have been apparent at the beginning. The descriptive language, even if it is the language of the final science at the end of days, cannot stand alone. For to be a language is not simply to be a symbol system, it is something defined by its use, and this use can only be by beings who cannot use it without having something more involved, something more that is not served by description.
To repeat, “inferential disposition” is not the apt term here, it must be “intention” or “commitment”. For when we talk ourselves into adopting such a default rule, we are amending our policies for engagement with whatever will come our way, in the full awareness that in doing so we are sticking our neck out, accepting the inductive risk.
Now, however, Sellars would seem definitely to be in a quandary. The input to factual information processing must be factual information, it must be a proposition. But our modalized pronouncements, such as those subjunctive conditionals about vases and lead samples, while apparently carrying factual descriptive information, also encode inferential commitments. And the subjunctive mood signals those commitments, with our consciously taken inductive risk.
So when the input involves conditionals, what’s the story?
The first part of the answer is already in the title “Concepts as involving laws and inconceivable without them.” And the second part is the connection between the theory of concepts and laws there presented, on the one hand, and the pragmatics of causal and modal discourse on the other. Sellars’ spelling out of this answer is long, complex, and subject to different readings, as well as to finicky objections.[Note 4] It is, I think, one of the ways we can glimpse Wittgenstein in Sellars:
It is clear that our empirical propositions do not all have the same status, since one can lay down such a proposition and turn it from an empirical proposition into a norm of description. (Wittgenstein 1969, para. 167)
I will explain it with an example. Why doesn’t water burn? Methyl alcohol looks like water, but it burns. So even that water does not burn may have been a general belief formed ampliatively, ‘by induction’.
That process of ampliative belief forming is one of talking ourselves into new rules to reason by (cf. 1958: 287-8, 291, 293): new inferential commitments. But there is a second stage, when these concepts are revised: those rules now change into criteria that constitute the concepts in question. The rule turns into a law, involved in the concept of water, and water becomes inconceivable without it. The concepts evolved into our current concept of burning as oxidationand of water as H2O, a fully oxidized substance. At that point it is no longer an empirical generality: it is now true ex vi terminorum that water does not burn.
Similarly it is today no longer an empirical claim that lead melts when heated to 621oF, it has become an analytic statement. After a certain time not long ago, anything that turns out not to melt when heated to 621oF is something that does not instantiate the (evolved) concept of lead.
Where Is The Inductive Risk Then? Analytic statements are not risky.So, what has happened to the element of inductive risk, the way or extent to which we stuck our necks out to begin? That has not disappeared, it has been transplanted to two new locations. (1) We have no guarantee that our conceptual framework will not break down in the face of new and previously unimagined phenomena. (2) There is equally ineliminable risk in the judgement that this or that liquid sample is water, that this concrete thing instantiates the concept of water. We have tests for water, and we have the general belief that any bit of liquid which passes those tests is water. That general belief is no foundation for the specific instance judgment, for all we have are the tests so far. We are lucky, that our environment has so many stable regularities, and passing a test once is typically enough – but that is not a logical truth. That general empirical assertion is equally at the mercy of nature’s continuing compliance.
Vindication And Disappointed Expectations. Taking an inductive risk is not something that can ultimately be justified, it can only be vindicated in fulfilled expectations or, of course, fall prey to disappointed expectations. (Hence the title of a later article, “Induction as vindication”.) We are entirely in the realm of practical reasoning, but as we see, that practical reasoning leads to changes in the mode of description, in the evolving descriptive language.
NOTES
Note 1. “Feyerabend arrives at the ontological truth that the world is in principle what scientific theory says it is, he does so by chopping the structure of science with a cleaver rather than carving it at its conceptual and methodological joints. As I see I it, only someone who is unaware of the subtle interdependence of the various dimensions of the framework of empirical knowledge, would speak cavalierly of the piecemeal replacement of part by scientifically better part.” (Sellars 1965: 187)
Note 2. This had in effect been argued by Hans Reichenbach in 1952. See Reichenbach (1959: 198) and the clarification by Maria Reichenbach (ibid.: 193). Just as a side note, these multiple functions of language, often not distinguished syntactically in ordinary language, had been a theme in the Significs movement, but I don’t know of any links to connect Significs with either Reichenbach or Sellars.
Note 3. Berkeley’s Works: Principles (W2, 51) and Alciphron (W3, I, 12, p. 53).
Note 4. As always with Sellars, there is a lot more to be said about the idea of induction. Here I have been attending to his dispute with C.I. Lewis in (Sellars 1948) as well as the 1958 article. The complete answer should have been in his later “Induction as vindication” (1964), and there is a lot there, but it is quite difficult to disentangle the strands in its tangled skein.
Rorty, Richard (1970). “In Defence of Eliminative Materialism” in The Review of Metaphysics XXIV. Reprinted Rosenthal, D.M. (ed.) Materialism and the Mind-Body Problem (Englewood Cliffs: Prentice Hall 1971).
Sellars, Wilfrid (1947) “Pure Pragmatics and Epistemology”. Philosophy of Science 14: 181-202.
Sellars, Wilfrid (1948) “Concepts as involving laws and inconceivable without them”. Philosophy of Science 15: 287-315.
Sellars, Wilfrid (1964) “Induction as vindication”. Philosophy of Science 31: 197-231.
Sellars, Wilfrid (1965) “Scientific Realism or Irenic Instrumentalism”. Pp. 171-204 in R. S. Cohen and M. W. Wartofsky (eds.) Boston Studies in the Philosophy of Science, Volume Two. New York: Humanities Press.
[This is a reflection on Johanna Wolff, “The Philosophical Significance of the Representational Theory of Measurement – RTM as Semantic Foundations” (2023).]
The outcome of a measurement is a representation of the item measured. As an example take an echocardiogram: the monitor display is an image of the heart which shows how the blood flows through the heart and heart valves. An image is a representation, but so is the mechanical display, in which a needle points to 17, on an old-fashioned tire pressure gauge. And so is the list of numbers a carpenter writes down to record his measurements before he initiates his repairs.
This simple point, though it suffices to subsume measurement as a topic under the heading of representation, is not nearly enough to answer questions about what measurement is, let alone to constitute a theory of measurement.
A foundational account?
As Wolff points out, when Krantz, Suppes, and Luce introduced the Representational Theory of Measurement (RTM) there is every sign that it is meant as providing a foundational account of measurement. But what foundations does it provide; indeed, how are we to understand just what can be meant by foundations of measurement?
A look at what Krantz et al. do suggests an analogy to foundations of mathematics, as in Principia Mathematica. But it also suggests something like a pun: their main achievement is a series of representation theorems, in the sense that term has in mathematics. The representation theorem most familiar to philosophers is that every Boolean algebra can be represented as an algebra of sets. “Can be represented as” amounts here to just “is isomorphic to”. In other cases the representation meets a still lower bar. Suppose we rank players on a football team with the relation is at least as good as (a ‘weak ordering’) Then we can represent this relationship numerically, with ≤ representing that relationship. But two players may then have to be assigned the same number: a homomorphism rather than an isomorphism. (Every finite weak ordering is homomorphic to a numerical weak ordering.)
But is there something more to it? If we have ranked the players, with that weak ordering, we have already measured them, our judgments of form “X is at least as good as Y” are our initial measurement results. We can take the further step to represent the team numerically, changing our one-step measurement procedure into a two-step procedure. The added second step is a ‘paper and pencil’ step. Any truly foundational question about measurement must pertain already to the initial ranking.
Wolff explains the predicament in this way:
Given a suitable axiomatisation of the empirical relational structure, RTM shows in a formally rigorous way that a numerical representation of that empirical structure is possible. […] So, once the axiomatic characterisation of the empirical structure in question is in place, RTM provides us with solid, formal tools for finding suitable numerical representations for the relational structure in question. But here is the one-million-dollar question: what justifies a particular axiomatization of the empirical relational structure? (p. 86)
Failure of operationalism
There is barely any guidance for this in Krantz et al., and what guidance there is consists mainly in nods in the direction of operationalism. One demand, they write, “is for the axioms to have a direct and easily understood meaning in terms of empirical operations” (Krantz vol. 1: 25). One example: a collection of rods, with two procedures, ordering by length and concatenation. If the former is executed repeatedly, until it results in no further change, then the collection becomes weakly ordered. Read backward: the weak ordering postulated in the axioms has as its meaning that if a ≤ b then the ordering procedure would, if it were executed, place a below b.
The second part of this guidance imposes a demand for descriptive adequacy. That implies in this example that if a, b, c are submitted to the ordering procedure pairwise, the result will be such that if a is placed below b, and b is placed below c, then an independent such operation would place a below c.
Note well that these explanations are rife with counterfactuals. The facts are only that certain rods have actually been shuffled into a relative place arrangement. The axiom says something much more far-reaching, namely that the relation of being at least as long as is a weak ordering.
Meaning? There is certainly a connection between what the theoretical terms mean and the operations that count as measurement procedures. But it is not the connection between definiendum and definiens.
Coordination, empirical grounding
Operationalists wanted to write the story as beginning with a game with rods which arranges and re-arranges them until equilibrium is reached, segueing into an axiomatic description of the result.
We should read it backward: the game begins with an axiomatic ‘rational reconstruction’ of the intuitive notion of length, and
that axiomatic description is ‘empirically grounded’ by specifying how certain procedures, which are executable under certain conditions, establish whether or not a ≤ b.
Perhaps this is clear enough for such unrealistically simple examples about rods and balances. The important further point is that this view of the relation between the resultant numerical structure of the measurement outcomes extends smoothly to more advanced theories and their theoretical quantities.
To take the theoreticity of the quantities involved just one step further, the above goes for the axioms of mechanics about mass and force, empirically grounded in the experimental procedures described by Ernst Mach as his “definition” of mass.
Mach introduced his approach in his 1868 article “Über die Definition der Masse”, and elaborated it in his 1883 book The Science of Mechanics. The direct measurement of acceleration is taken as given, and Mach does not hesitate to use the term “definition” for the combination of two principles:
All those bodies are of equal mass which, mutually acting on each other, produce in each other equal and opposite accelerations.
If we take A as our unit, we assign to that body the mass m which imparts to A the acceleration that A in the reaction imparts to it. (1974 Dover print: 266)
This is not left abstract: Mach gives examples of simple machines that can perform the required experiments – or, as he later also said, measurements of mass.
It was Suppes who elsewhere insisted strongly on the point that Mach’s ‘definition’ of mass was not a definition, in our current sense of the term. (The same complaint applies to Reichenbach’s notion of a coordinative definition.) We have the criterion that a defined term is one that can everywhere be replaced without loss by its definiens.
Mach’s principles enable no such innocuous rewriting of mechanics. The argument is simple: if a body were to be unaccelerated (throughout its existence), because the (total) force on it is zero, then the laws of mechanics tell us only that its mass multiplied by zero equals zero. So mass cannot be defined in terms of actual behavior. Counterfactuals about behavior, on the other hand, we can only derive from the theory itself, and are not independently ascertainable.
Nevertheless, it is those mechanical, implementable procedures, in which bodies are made to interact so as to induce mutual accelerations, that give empirical significance to the theory of masses and forces. The term “mass” is not defined, the quantity is not derivative, but in the theory the role of mass is such that its value can determined, in principle, in specified realizable conditions (from data plus equations provided by the theory).
Some quantities are directly measurable: Krantz et al.’s examples, such as arrangement of rods by length, are thus. In contrast, a theoretical quantity is one that can be measured only by procedures which the theory in question, itself, counts as measurements of that quantity. Mass and force are cases in point.
But those procedures, which can be implemented by us, and are therefore describable also in non-theoretical language, are precisely what anchors the theory to what the theory is about. In the time of Schlick and the early Reichenbach, the philosophical term for this connection was “coordination”. Currently the more favored is “theory-mediated measurement”. I have spelled it out as “empirical grounding”. Whatever the terminology, seeing matters in this light we are not prey to a naïve foundationalism, but also not bereft of links of our theories to the world we live in.
Wolff’s proposal: RTM as semantics of measurement
Since, as Wolff points out, Krantz et al. give only hints about the way the axiomatic characterization of the empirical structure gets in place, does their work have value except as a library of mathematical theorems?
Wolff argues that it does, and offers the original proposal that RTM provides a semantic foundations of measurement.
In the more usual narrow sense, the topic of semantics is the relation between a language and what that language is about. But language is only our main means of communication, by representing things to each other: we paint word pictures. So in the broader sense, the topic of semantics is meaning überhaupt, about the relation between any representation and what it represents.
“Representation” in the sense of representation theorems in mathematics refers to certain straightforward types of mappings: isomorphism, homomorphism …. Focusing there, Krantz et al. say, somewhat myopically, that measurement is
Wolff quotes this, and points out that it is at least incomplete as a statement of what measurement is, by omitting any reference to the experimental or observational procedures that lead to a characterization of those empirical structures of interest. It is incomplete also in omitting any attention to the theory-mediated character of all but direct measurement: that the pointer is at 17 signifies according to, or relative to, the theory that dictated the design of the manometer, that the tire’s pressure is 17psi. Outcomes of measurement are not simply physical events, they have meaning, in the way that words and pictures do.
the construction of homomorphisms (scales) from empirical structures of interest into numerical relational structures that are useful. (p. 9, quoted Wolff p. 96)
The achievement and value of RTM, with respect to the aspect of measurement on which it focuses, Wolff argues, is that it brings us a structuralist view which goes far beyond the naïve idea that measurement is an assignment of numbers, let alone the reading of a book of nature written in arithmetic.
Take again the example of the soccer team weakly ordered by our judgments of how good the players are. This can be represented numerically. An assignment of numbers which reflects that ordering is in effect a homomorphism of the team, thus viewed, into the real number continuum. And any such assignment of numbers is adequate to that task: the ‘scale’ is highly non-unique. So such a measurement has outcomes that have a lot of surplus structure: to understand it properly we need to know what transformations any such adequate numerical representation remains adequate. And what we know, if we know that, is the invariant structure common to all such adequate assignments of numbers: that is the real outcome!
My example is too simple to show how crucially this matters. To see better, think of how temperature measurement outcomes are different from height measurement outcomes. In the case of height, we can meaningfully say that Paul is twice as tall as his son Peter. Whether we use inches or centimeters as our scale, the result is the same, for here the relevant transformations are linear. But if we compare the temperatures of two cups of tea we cannot do the same: the transformation of the Celsius or Kelvin scale to Fahrenheit will not agree on “twice as hot”. For the relevant transformations of temperature scales, it is only the ratios of intervals that are invariant.
Elicitation of the invariant structure is the subject of the uniqueness theorems. A representation theorem needs to be followed by a uniqueness theorem: the adequate numerical representation is unique up to transformations of type such and such.
Wolff makes the point in strong terms:
RTM shows us, what numerical presentations, and measurement representations in particular, tell us about the phenomena: they tell us something about the structure of phenomena, and nothing else. (Wolff 2023: 102)
The semantic foundations provided by RTM, then, are not primarily about reference … but rather about inference – how do we ensure our indirect reasoning about the phenomenon using a numerical representation is warranted. (ibid., 102/103)
If I report that today’s high and low temperatures are respectively 70oF and 35oF, how much information I have given you? Not the non-invariant judgment that the high temperature is twice the low. The point applies, in different ways, to non-numerically presented outcomes. A photo of a dog is two-dimensional, flatter than a pancake, but the dog is not to be inferred to be anything like as flat as a pancake. We understand a measurement outcome properly only if we infer from it only what we could infer from it after any of the relevant transformations: that is to say, only if we grasp the structure it reveals.
REFERENCES
Krantz, David H., R. Duncan Luce, Patrick Suppes, and Amos Tversky (1971) Foundations of Measurement. New York: Academic Press.
Mach, Ernst. (1883/1974) The Science of Mechanics: A Critical and Historical Account of its Development. New York: Dover.
Wolff, J. E. (2023) “The Philosophical Significance of the Representational Theory of Measurement – RTM as semantic foundations”. Critica 55: 81-107.
At first blush these two topics may seem entirely unrelated. Stalnaker’s Thesis is that the probability of (If A then B) is the conditional probability of B given A. A Moore Statement (one that instantiates Moore’s Paradox) is a statement that could be true, but could not be believed.
But the two get closer when we replace the intuitive notion of belief with subjective probability. Then there are two kinds of Moore Statements to be distinguished: An Ordinary Moore Statement is one that could be true, but cannot have probability one. A Strong Moore Statement is one that could have positive probability, but could not have probability one.
When we introduce conditionals with Stalnaker’s Thesis there are Moore Statements in our language. I will give examples of both sorts, and indicate why they are important.
Example1. Imagine the following situation: 1. The match is not struck 2. The match is wet 3. It is not the case that if the match is struck, it will burn.
Om the basis of lines 1. and 2 we can give we can give several warrants for line 3. Following Stalnaker, we could assert that if the match is struck, it will not burn, because it is wet. And then equivalently, it is not the case that if the match is struck then it will burn. Obviously Lewis might reject this reasoning. However, Lewis would then say that the match mightor might not burn if struck. But that also implies that it is not the case that the match will burn if struck.
Now define:
X = (the match is not struck, and it is not the case that if the match is struck, it will burn)
The above imaginary scenario shows that X could be true. But X could not have probability one.
Let’s use P for probability as usual. If a conjunction has probability 1, so do its conjuncts. Thus if P(X) = 1 then P(the match is not struck) = 1, and P(the match is struck) = 0.
So then P(if the match is struck it will burn) = P(the match burns | the match is struck) is either 1 or undefined. (This depends just on the convention adopted for probability conditional on a proposition with probability 0.) Hence P(It is not the case that if the match is struck, it will burn) either equals 0 or is not defined. And accordingly, that is so for X as well: P(X) = 0 or P(X) is undefined.
Therefore X is an Ordinary Moore Statement.
To display and example of a Strong Moore Statement, we need to show that something can have positive probability. For this we can use a numerical example.
Example 2. Tosses with a fair die.
The basic statements involved are just about the outcome of a toss, and each outcome has probability 1/6. Define:
A = the outcome is either two or six. True in possibilities {2, 6}
~ A = the outcome is neither two nor six = the outcome is either odd or 4. True in possibilities {1, 3, 5, 4}
B = the outcome is six. True in possibilities {6}
Y = (~ A and it is not the case that if A then B)
The probability of ~A is 4/6.
By Stalnaker’s Thesis, the probability of the conditional (if A than B), equals P(B | A) = 1/2 = 3/6. So the negation of that conditional also has probability three out of six: P(~(if A then B)) = 3/6.
The probability of the disjunction of ~ A and ~(if A then B) is the sum of the probabilities of their disjuncts minus P(Y). This cannot be greater than 1. So (3/6) +(4/6) – P(Y) is less than or equal to 1.
Since the probability of the two conjuncts of Y together cannot be more than 1, it follows that their conjunction (that is, Y itself) has a probability greater than or equal to 1/6.
By the same argument as in Example 1, mutatis mutandis, Y cannot have probability 1.
Therefore, Y is a Strong Moore Statement.
Remark 1, re belief.
In thescenario for Example 1 it is natural reaction to say that we can believe X. That seems right, and if so, shows that the intuitive notion of belief does not imply subjective probability 1. There are other reasons to suggest that belief takes only a “sufficiently high” subjective probability (cf. Eva, Shear, and Fitelson 2022). This does have the drawback that no single number is high enough for all examples (the lottery paradox), so that belief must be context-sensitive.
Remark 2, re closure under conditionalization.
In an earlier post (Conditionals, Probabilities, and ‘Or to If’ 12/07/2022) I presented the argument that,
for any given domain, the set of probability functions that satisfy Stalnaker’s Thesis is not closed under conditionalization.
The argument was rather abstract, and what it lacked were good, concrete examples. Examples 1. and 2. above fill that gap.
There was a similar situation with the Reflection Principle for subjective probability. A probabilistic Moore Statement is one that is not self-contradictory, one that can even have a positive probability, but you cannot conditionalize on it, because it cannot have probability 1. That there are such statements entails that the set of probability functions which satisfy the principle, on a given domain, is not closed under conditionalization.
For a discussion of how Moore’s Paradox is related to closure under conditionalization see also my post “A Brief Note on the Logic of Subjective Probability” (07/24/2019)
Remark 3, about triviality results
Lewis’ famous triviality result for Stalnaker’s Thesis assumed that the admissible probability functions on a model is closed under conditionalization, and indeed, that it should be. The above examples show that this assumption of Lewis’ precludes Stalnaker’s Thesis from the outset.
Similarly for the other triviality results that I have seen.
REFERENCES
Eva, Benjamin; Ted Shear and Branden Fitelson (2022) “Four Approaches to Supposition”. Phisci-archive.pitt.edu/18412/7/fats.pdf