Odds are more intuitive than probabilities (3): the math

Thinking about odds brings new insights into how we deal with probabilities.  It illuminates puzzles about confirmation, conditionalization, and Bayes’ Theorem, as I illustrated informally in the earlier posts tends to take a helpfully simpler and more intuitive form when it is in terms of odds.  Now I’ll explore the ins and outs of odds in a natural mathematical setting.  (With examples and exercises.)

Odds more general than probabilities

Odds are ratios of probabilities. For example if the odds of snow against no-snow are 3 to 5 then the probabilities of snow and no-snow are 3/8 and 5/8 respectively.  And vice versa.  

But that example is special: it allows the deduction of the probabilities from the odds.  Sometimes we know the odds but not the probabilities.  Suppose four horses are running:  I might say that I don’t know how likely it is that Table Hands will win, but he is twice as likely to win as True Marvel.  The odds for Table Hands against True Marvel are two to one. 

So odds provide a more general framework for reasoning and deliberation.

To move smoothly from our intuitions to more precise notions, let’s begin with a finite probability space, and redescribe it in terms of odds.  

A finite probability space

M= <K, F, PP> is a probability space iff K is a non-empty set, F is a field of subsets of K, and PP is a non-empty set of probability functions with a domain that includes F.  M is a simple probability space  if P has only one member.

The elements of K are variously called points, outcomes, events, possibilities, or – as I will do here – worlds.  For example these worlds could be the outcomes of tossing a die, or the points at which a team can wipe out in the World Cup.  The elements of F are sometimes called events too, or – as I will do here – propositions

A field of sets is a Boolean algebra of sets, with ⊆, ∩, ∪, and ~.  

In this post, to keep things simple, I will take K to be the finite set {x1, …, xn}, F the family of all subsets of K, and PP to be the set of all probability functions defined on F.

From probability to odds

probability vector p is a function which assigns a number to each world, these numbers being non-negative and summing to 1.  We write it in vector notation:  p = < p1, p2, …, pn>.  A probability function P defined on the propositions is determined entirely by a certain probability vector p:  P(A) = Σ{p(x): x is  in A}, or equivalently,  p(x) = P({x}), for each world x.  So we can stick to just the probability vectors in examples.

Let’s spell out odds in the same way.  An odds vector is like a probability vector: the values it assigns to worlds are non-negative real numbers.   But the sum of the assigned numbers need not be 1.  For example if x1, …, xn are the outcomes of the toss of a fair die, that is represented by odds vector <1,1,1,1,1,1>.  

Odds vector v satisfies the statement that the odds of A to B are a : b exactly if a : b =  Σ{v(x): x is  in A} : Σ{v(x): x is  in B}.  (I’ll make this precise below.)

Note 1, the null vector. There is no practical use for a locution like “odds of 0 to 0”, and so it would be reasonable to exclude the the null vector, which assigns 0 to each world.  It certainly does not correspond to any probability vector.  But sometimes it simplifies equations or calculations to let it in, so I will call it an odds vector too, for convenience, and by courtesy. 

Note 2, certainty.  That proposition A is certain means that the odds of A to ~A are 1:0, that is, infinite.  This makes sense in the extended real number system, with ∞ the symbol for (positive) infinity.  When A is certain its odds against anything incompatible with A are infinite.

It may be convenient to put it in a negative way.  The odds of (It both won’t and will rain tomorrow) to (It either will or will not rain tomorrow) are 0 : 1.  That is a well-defined ratio, and means that the first proposition is certainly not true (ranked as such by the odds vector in question).  Equivalently, of course, its negation (the second proposition) is certain.

Probability vectors are odds vectors.  But now we have a redundancy.  Two odds vectors are equivalent if one is a positive multiple of the other.  For example, the odds of 4:2 are the same as the odds of 2:1.  

Mixtures

If P and P’ are probability functions defined on F then so is P* = xP + (1-x)P’, provided x is in [0,1].  P* is a mixture (convex combination) of P and P’. [1] It is an important feature of the model that the set PP of probability functions defined on F is closed under the formation of mixtures.

Mutatis mutandis for odds: here the method of combination is not convex but linear.  The restriction on the coefficients in the mixing equation for probability is not needed.

Definition.  mixture of odds vectors is a linear combination:  v* = av + bv’, provided a, b are non-negative real numbers.

Note well that you cannot just replace an odds vector by an equivalent vector anywhere.  Addition is linear, that means that v + v’ is equivalent to k(v + v’) = kv + kv’.  Even though v’ is equivalent to kv’, v + v’ is not in general equivalent to v + kv’. 

EXAMPLE 1.  K is the set {1, 2, 3, 4, 5, 6} of outcomes of the toss of a die.  This die is one of two dice.  One of them is fair, the other is loaded in such a way that the higher numbers 4, 5, 6 come up twice as often as the lower ones.  And the die that is tossed is equally likely to be the one or the other.

We have vector v = <1,1,1,1,1,1> to give the odds on the assumption that the die is fair.  Similarly, vector v’= <1,1,1, 2,2,2> represents the case of a loaded die with the higher numbers twice as likely to come up as the lower ones.  We are unsure whether our die is fair or loaded in that particular way, with no preference for either, so for our betting we adopt an equal mixture:

            v* = v + v’  = <2,2,2, 3, 3, 3>

and now our odds for any outcome – e.g that the outcome is an odd number – will be halfway in between.  For example, the odds of the outcome being ‘high’ to its being ‘low’ is half-way between what it is for the two dice, that is, one and a half ( i. e. 3/2, as you can easily see).  

EXERCISE 1.  A hiker is at point A and would like to get to point X.  Being ignorant of the trail system he chooses at random at each juncture. From A there go 6 trails: one goes to B1, two go to B2, and three go to B3.  From each of these B points, there go 3 trails.  From B1, one goes to X, from B2 two go to X, and from B3 all three go to X.  What are the odds that the hiker reaches X (and what is the probability that he does so)?  Reason with odds.

Answer. For the model, K = {X, ~X}.  At the B points the odds vectors are <1, 2>, <2, 1>, <3, 0> respectively.  At A, there are different odds to reach the B points, 1:2:3.  So the correct odds vector for this situation is the mixture:

            1<1,2> + 2<2,1> + 3<3,0> = <14, 4>.

The odds are 14: 4, or 7:2, the probability of reaching X is 14/18 or 7/9.  (Check this by also doing the exercise reasoning with probabilities.)

Truth conditions for odds statements[2]

A proposition is true in a world in model M (the world satisfies that proposition) iff that world is a member of that proposition.  Notation:

            M, w╞A     iff w is in A       

We will similarly say that an odds vector satisfies an odds statement under the appropriate conditions.

Our language has a classical sentential syntax with &, v, ~ and one special sentential operator O, numerals, and atomic sentences. I will use the same capital letters for sentences as for propositions, and K for the tautology. The sentence formed by applying connective O to A, B, in that order and the numerals x and y I will write, to be reader-friendly as O(A : B) = x : y.  It is read as “the odds of A to B are x : y”, and it is called a (simpleodds statement

I’ll use the same symbol for satisfaction, and write “v╞ E” for “ odds vector v satisfies odds statement E”.  The truth conditions for simple odds statements are then, as you would expect:

M, v╞ O(A : B) = a : b if and only if  a : b =  Σ{v(x): x is  in A} : Σ{v(x): x is  in B}

EXAMPLE 2. In EXAMPLE 1, v* = <2,2,2,3,3,3>.  If A = (outcome is odd) and B = (outcome is even) then 

M, v╞ O(A : B) = 7 : 8.

For Σ{v*(x): x = 1, 3, 5 } : Σ{v*(x): x = 2, 4, 6 } 

                     = (2 + 2 + 3) : (2 + 3 + 3) = 7 : 8

To complete this part we have to look at the more general concept of odds relative to a ‘given’ in general.

Conditionalization: a linear operator

As I discussed in an earlier post, conditionalizing an odds vector on a proposition A consists just in assigning 0 to the elements not in A.  We can make it precise the following way.  

Let JA be the indicator function of A, that is, the function that gives value 1 to elements of A and value 0 to all other elements. For example, with x and y in K, if A = {x, y},  JA(x) = 1 = JA(y), and if z is neither x nor y then JA(z) = 0

Definition.   (IAv)(x) = def JA(x)v(x), where v is any vector, with real number components  and x is a world in K

It may be puzzling why I give this definition for vectors with negative components at the same time, although they are not odds vectors.  But it will simplify things later to do that.

So IA is an operator, it operates on vectors, it is a linear operator.  The most visual way to display that, in the case of finite vectors, is by representing the operator by a matrix.  Suppose K = {x1, x2, x3, x4}, that v = <2, 3, 4, 5>  and A = {x1, x3}.  Then the matrix representation of the action of IA on v is this:

(Briefly, the rows in the matrix are vectors too. To get the first component of the new vector multiply the top row of the matrix and the old vector.  And so forth.  The multiplication is the inner product of the two vectors, which I will discuss properly below.)

Truth conditions for conditional odds statements

Let’s extend the language. The sentence formed by applying connective O to A, B, C in that order and the numerals x and y, I will write, to be reader-friendly as O(A : B|C) = x : y.  It is read as “given C, the odds of A to B are x : y”, and it is called a conditional odds statement.

A conditional odds statement has the following truth conditions:

M, v╞ O(A : B|C) = x : y if and only if  x : y =  Σ{v(x): x is in A ∩ C} : Σ{v(x): x is in B ∩ C}

which is equivalent to 

M, v╞ O(A : B|C) = x : y if and only if  x : y =  Σ{ ICv (x): x is in A} : Σ{ ICv (x): x is  in B ∩ C}

and to the more intuitive

M, v╞ O(A : B|C) = x : y if and only if  ICv╞ O(A : B) = x: y

It is easily seen now that we can define the binary O, in simple odds statements, in terms of the ternary O:

Definition.  ‘O(A : B) = x : y’ for ‘O(A : B|K) = x : y.

Here the ‘given’ is a tautology, so imposes no constraint.

EXAMPLE 3.  We have our die loaded so as to favor the higher numbers, represented by the odds vector v = <1,1,1, 2, 2, 2>.  What are the odds of throwing a 5, conditional on the outcome being an odd number?

Here C is {1, 3, 5} so ICv = <1, 0, 1, 0, 2, 0>  while A and B are {5} and {1, 2, 3, 4, 6}.  The odds in question are therefore 2: (1 +1)  = 2:2, i.e. fifty-fifty, as they say.

EXERCISE 1.  Olivia and Norman play a game:  they have an urn with 35 black balls and 1 red ball. They take turns drawing without replacement, and the one to get the red ball wins.  But they are interested only in getting a Superwin: get the red ball on your first draw.  Norman chivalrously offers Olivia the choice to go first if she wishes.  She thinks she could end the game at once with a Superwin (chance 1 in 36).  But if she doesn’t then Norman will have an advantage: 1 out of 35 to get a Superwin.  Would Olivia going first be to Norman’s advantage?

Answer.  There are three worlds: OW (Olivia wins), NW (Norman wins), NL (Both lose).  Suppose Olivia chooses to go first.  About the correct odds vector v = <v(OW), v(NW), v(NL)> for this situation, we know its conditionalizations on OW and on not-OW:

            IOW = <1, 0, 0>                       I~OW = <0, 1, 35>

v = IOWv + I~OWv

= <1, 0, 0> + <0, 1, 35>

= <1, 1, 35>

From this we can see that, even if Olivia goes first, the odds of Norman winning are the same as for her, namely 1: (1 + 35) = 1: 36.

Tests and error probabilities

The example that follows was in an earlier post, with a discussion about how reasoning by Bay’s’ Theorem amounts to finding the Bayes Factor, which is the number by which the prior odds are multiplied to yield the final odds.  I’ll repeat a small part here to illustrate how we now see conditionalization on evidence, taken from tests with known error probabilities.

EXERCISE 2. There is a virus on your college campus, and the medical team announces that 1 in 500 students have this virus.  There is a test, it has 1% false negatives, and 1% of the positive results are false positives.  You are one of the students, with normal behavior, and reckon that your odds of having the virus are accordingly 1:499.  You take the test and the result is positive.  What are your new odds of having the virus?

Answer.  Let’s say that all told there are 50,000 students and they are all tested.  There are 100 students who have the virus.  99 of them test positive and 1 tests negative.  There are 49,900 students who do not have the virus, and 499 of them test positive anyway.  So you are one of 99+ 499 = 598 students who test positive, and only 99 of them have the virus and 499 did not.  So the odds for you to have the virus is 99 : 499.  Your odds for having the virus have been multiplied by 99.

That was the intuitive frequency argument, told in terms of odds.  But what exactly was the manipulation of odds vectors that was involved?

There are four  worlds:  x1 (positive & virus), x2 (positive & no virus), x3 (negative & virus), x4 (negative & no virus).  The posterior odds vector, which we can read off from the narrative, is

v = <99, 1,  499, [50,000 – 99-1-499]>

But you tested positive,  so let’s conditionalize on that:

Ipositivev = <99, 0, 499, 0>

Putting it in terms of the odds language:

            The odds changed from 1: 499 to 99 : 499.

Jeffrey Conditionalization: a linear operator

Putting mixtures and conditionalization together we can define Jeffrey Conditionalization.  I call a Jeffrey shift the following operation on a probability function designed to change the value of a given proposition A to a specified number x:

(A → x)P = xP(. |A) + (1 – x)P( . |~A), where 0 ≤ x ≤ 1 and P(A) > 0 < P(~A)

Informally:  while A gets the new probability x, the ratios of the probabilities of subsets of A to each other remain the same as they were, and similarly for the ratios of the probabilities of subsets of ~A.

I’ll use a similar notation (A → x : y) for the corresponding operator on odds vectors, which changes the odds of A to ~A to x : y.

Definition.  (A → x : y)v = xIAv + yI~Av, with x, y non-negative

(If x = y = 0 this is a Jeffrey shift in odds only by courtesy notation.)

When we use the matrix representation it is clear how Jeffrey Conditionalization is a straightforward generalization of ordinary conditionalization.  

EXAMPLE 4.  Suppose K = {x1, x2, x3, x4}, that v = <2, 3, 4, 5>  and A = {x1, x3}.  The current odds of A to ~A are 6 : 8 or 3 : 4 or to make the generalization more obvious, 1 : (4/3).  Now if you want to double the odds for ~A, instead of multiplying by 0 for the ~A worlds, multiply by 2!

and the new odds for A against ~ A are 6 : 16 or 3 : 8 or 1 : (8/3).

EXAMPLE 5.  We thought we had a fair die, and so adopted odds vector v = <1,1,1,1,1,1>.  Then we learned that the even outcomes A = {2, 4, 6} are twice as likely to come up as the odd numbers.  So we update to the odds vector <1, 2, 1, 2, 1, 2>.  What was that?  It was the Jeffrey shift:

            (A → 2:1)v = 1I{1, 3, 5}v + 2I{2, 4, 6}v

                                           = <1, 2, 1, 2, 1, 2>

Orthogonal decomposition

partition in model M = <K, F, PP> is an set of mutually disjoint propositions which is exhaustive, that is, its union is K.  If S = {B1, … , Bm} and P is a probability function then the law of total probability says:

P =  P(B1)P(. |B1) + … + P(Bm)P(. |Bm)

The components P(. |Bj), j = 1, …, m are mutually orthogonal, by the following definition: 

Definition.  If P and P’ are probability functions defined on the same algebra F then P is orthogonal to P’ if and only if there is a proposition A in F such that P(A) = 1 and P’(A) = 0.

Notation:  P ⊥ P’.  This relation is symmetric and irreflexive.  

The corresponding definition for odds vectors is:

Definition.  If v and v’ are odds vectors defined on the same set K then v is orthogonal to v’ if and only if, for each member x of K, either v(x) = 0 or v’(x) = 0 or both.

Clearly two probability vectors are orthogonal iff the probability functions which they determine are orthogonal.

Using the same symbol for this relation, we note that in mathematics, there is for vectors in general there a standard definition of orthogonality:

v ⊥ v’ iff Σ {v(x)v’(x): x in K} = 0

Since the numbers in odds vectors are all non-negative, this sum equaling 0 is the case if and only if for any x in K, at least one of v(x) and v’(x) equals zero.  So suppose that E ={x: v(x) = 0}.  Then for v, E is certainly not true, while for v’, E is certainly true (by the definition in Note 2 above).  So this corresponds exactly to the condition of orthogonality for probability functions. We can also put it a third way: 

v and v’ are orthogonal exactly if there is a proposition A such that v =  IAv and v’ = I~Av’

Now we also have a neater way to give, parallel to the law of total probability, the law of total odds:

v = IBv + I~Bv

If T is a partition then v = Σ{ IBv: B a member of T}

and this is an orthogonal decomposition of odds vector v.

Odds’ natural habitat in mathematics

The odds vectors are part of a finite-dimensional vector space.  A vector space over the real numbers is a set of items (‘vectors’) closed under addition and scalar multiplication by real numbers. When the vectors are sequences of numbers (as they are in our context) the odds vectors are singled out by having no negative number components.

The dimensions correspond to the worlds – the worlds are the dimensions, you might say.  With the worlds numbered as above, world x1 is represented by the unit vector v(x1)= <1, 0, 0, …, 0>, v(x2) = <0, 1, 0, …, 0> and so forth.   The unit vector that corresponds to world x is the one a which ranks world x – or more precisely, the proposition {x} — as certain.  These unit vectors are mutually orthogonal and span the space in this sense:  each vector in that space is a linear combination of these unit vectors.

Propositions correspond to subspaces.  If A = {x1, x2, x3} then A corresponds to the subspace [A] spanned by {v(x1), v(x2), v(x3)}.  Proposition A is ranked as certain by precisely those vectors which are in [A].[3]

The operator IA is a projection operator, it is the projection on subspace [A].  If v is any vector then IAv is the vector that is exactly like v except for having 0s for worlds not in A.  

The model’s associated vector space

So let’s make it official.  The finite probability space M = <K, F, PP> has an associated vector space V(M).  Most of its description is already there in the discussion above.

The Boolean algebra of propositions F has as counterpart in V(M) a Boolean algebra of subspaces of V(M).  (Note well: that is not the algebra of all subspaces of V(M), which is not Boolean – I will illustrate below.)

Each proposition A in F is a set of worlds {xj, …, xk

Definition. [A] = the subspace spanned by the vectors v(y): y in A.  

Call [A] the image  of A in V(M).

Notation: if X is any set of vectors, [X] is the least subspace that contains X, and we say that X spans that subspace. In the case of a unit set {v}, I’ll abbreviate [{v}] to [v].

Define the algebra of subspaces [F] to be the set of images of members of F, with the following operations:

      meet:  [A]  ∧ [B] = [A ∩ B]

      join:    [A] ⊗ [B] = the least subspace that contains both [A] and [B]

      orthocomplement:  [A]  = {v in V(M): v ⊥ v’ for all v’ in [A]}

      order:  [A] ≤ [B] iff A ⊆ B

First of all, the order is just set inclusion A] ≤ [B] iff  [A] ⊆ [B].  Secondly, the meet is just set intersection: A]  ∧ [B] = [A] ∩ [B], for the largest subspace contained in two subspaces is their intersection.

The other two operations are less obvious.  [A] ⊗ [B] does not just contain [A] and [B] but also the linear combinations of vectors in [A] and vectors in [B].  

Clearly [A] ⊗ [A] = [K], but the vectors that belong to neither [A] nor [A] are not to be ignored.

That [F] is isomorphic to F, though the algebra of subspaces is not Boolean

The point is, first, that [F] is indeed Boolean, isomorphic to F, but second that there are subspaces that are not images of propositions, and because of these, there are violations of the Boolean law of distributivity. 

To take the second point first, let v = av(x1) + bv(x2), with both a and b positive.  Since v(x1) = <1, 0, …> and v(x2) = <0, 1, 0, …> we see that v = <a, b, …>.  Suppressing the extra zeroes we can picture it like this:

[v] is not an image of any proposition.  The least subspace that contains v is [v] = {kv: k a real number}, the ray (one-dimensional subspace) spanned by v.  Note that v is a mixture of those two unit vectors, so [v] is part of ([{x1}] ⊗ [{x2}]).  Denoting the null space (subspace that contains only the null vector) as f:

[v]  ∧ [{x1}] = {f};           [v]  ∧ [{x2}] = {f}

[v]  ∧ ([{x1}] ⊗ [{x2}]) = [v]

The null vector has only zeroes as components, the subspace it spans contains only itself, and v is not the null vector.  

So this is a violation of the law of distributivity, which would imply that 

([v]  ∧ [{x1}])  ⊗   ([v]  ∧ [{x2}])  =   ([v]  ∧ ([{x1}] ⊗ [{x2}]))

In other terminology:  the lattice of (all) subspaces of a vector space is a non-distributive lattice.

Why is the smaller algebra of subspaces [F] nevertheless Boolean, and isomorphic to F?  The reason is that the unit vectors corresponding to worlds are all mutually orthogonal.  That makes them images of propositions mutually compatible in the sense in which this word is used in quantum mechanics.[4]  We need only verify:

[C] ≤ [A] ⊗ [B]  iff C ⊆ A ∪ B

That is so because the right hand side is equivalent to [C] ⊆ [A  B] = [A] ⊗ [B], and the order in [F] is set inclusion

[C] ≤ [A] iff C ⊆ ~A

That is so because ~ A contains precisely those worlds x such that v(x) is orthogonal to all vectors v(y) such that y is in A.

About the Reflection Principle

The General Reflection Principle demands that your current opinion (represented by a probability or expectation function) is within the range (convex closure) of the future opinions you foresee as possible.  How does that idea look with odds?

The simplest case is obvious.  Suppose the worlds are the possible outcomes of an experiment (e.g. toss of a die) and you are sure that the outcome will be one of the first three.  Then your current opinion must assign 0 to the other dimensions, i.e. be in the subspace spanned by those first three corresponding unit vectors v(x1), v(x2), v(x3).  

EXAMPLE 6.  We are conducting an experiment with set of possible outcomes being the partition T = {B1, … , Bm}.  Our current opinion for the outcome is vector v, so we know our possible posterior opinion will one of the vectors in the orthogonal decomposition {IB1v, … , IBmv). This corresponds to conditionalization in the case of probability – that the outcome of an experiment is a projection on a subspace is called the Projection Postulate in discussions of quantum mechanics.

It is a bit more complicated when you have a more arbitrary set of foreseen possible posteriors, say a set X of odds vectors of some sort.  Then the principle should demand that your current opinion is represented by an odds vector that lies within the least subspace that contains X.  What is that?  

The answer appeals to ‘double negation’.  First take the set of all vectors that are orthogonal to all members of X, which is the orthocomplement X of X.  Those are the opinions certainly ruled out by the principle.  Then take the orthocomplement of that:  X⊥⊥.  

It is a theorem that, whatever X is, X⊥⊥ is a subspace, and it is the least subspace that contains X.  

The Reflection Principle then demands that your current opinion is constrained to be an odds vector that lies in that subspace.

Expectation Value

What are called quantities elsewhere statisticians call random variables.  A random variable on the model is any function that assigns a real number to each world in K.  For example, K might be the days this week and function r assigns each day its precipitation amount.  So a random variable r, in this case, is representable by a vector r = <r(x1), r(x2), …, r(x7)>.  

Definition.  The expectation value Ep(r) of r for p is Σ{ p(xj)r(xj) : j = 1, …, 7}, provided p is a probability vector.

But that is exactly the definition of the inner product (also, called scalar product) on our vector space:

Definition.  The inner product of vectors v and v’ is the number (v, v’) = Σ{v(x)v’(x) : x in K}.

So the expectation value of quantity r for probability p is the inner product of their representing vectors.[5]  

Since the values of the random variable, e.g. the amounts of precipitation, are absolute values on a chosen scale (e.g. inches) the expectation value is not something comparative, and there is no advantage is adapting the concept of expectation value from probability to odds. 

But when this subject was originally created in the 17th century, before the technical concepts had solidified in our culture, we can read the texts as discussing the matter in terms of odds, quite naturally.  (Translations tend to do so in terms of  as probabilities and expectation values, that is, in terms of the concepts we mainly employ today, but I suggest that this may be anachronistic.)  

For example, here is Huyghens’s Proposition III:

If I have p chances to get a, and the number of chances I have to get b is q, then (assuming always that each chance can occur equally easily):  that is worth (pa + qb)/(p + q) to me. (My translation from the Dutch.)

Here p and q can be any natural numbers, say 17 and 51.  The division by their sum points us to reading his text as, in effect, ‘If the probability to get a equals p/(p + q) …”.  I am not saying that is wrong, I agree that if values to me are described in absolute rather than comparative terms, that is natural as well.  

But think of this in a larger context:

I have 17 chances to get a, 51 chances to get b, 101chances to get c, …

You want to buy from me the chances to get a and to get b

How much do you owe me?

Three remarks: 

  • the first line is most easily read as displaying two vectors, namely an odds vector <17. 51, 101, …> and a random variable vector <a, b, c, …>;
  • to calculate the fair price, reference to all the other contracts or lottery tickets that I have can be omitted,  
  • the price must be an appropriate fraction of (a + b), with proportions of a and of b in the ratio 17 : 51, that is, 1 : 3.

So this is a way of reading the text, I think very naturally, in terms of odds thinking.  

Admittedly these three remarks do not yet, taken together, yield Huyghens’ result.  The gap is filled by his symmetry argument about games in the proof of his Proposition III.  (See my post “Huygens’ probability theory: a love of symmetry” of April 2023.)

ENDNOTES


[1] The term “mixture” is common in physics, not in mathematics, but I like it because it is a term that aids visual imagery.

[2]  I’m going to fuzz the use/mention distinction a bit from here on.  As my friend Bob Meyer used to say, we are following the conventions of Principia Mathematica.

[3] Think about quantum logic.  As introduced by von Neumann: subspaces are identified as the propositions for that logic. Various intuitive motivations have been offered for this.

[4] The unit vectors that correspond to worlds, in the way indicated, and which form a basis for the space, are the eigenvectors of a single observable.  Propositions correspond to statements to the effect that the eigenvalues of this observable are within a certain range.

[5] Geometrically, the inner product measures the angle between the two vectors, and the inner product of a vector with itself measures its magnitude. Notation:

            ||v|| = square root of (v,v) 

            𝜙     is the angle v^v’ between vectors v and v’ iff the cosine of 𝜙 =   (v, v’)/(||v||.||v||).

                        Equivalently, (v,v’) = ||v||.||v’||cos(v^v’).

Note that the cosine varies inversely with the angle.

Leave a comment