Probabilism I – Kolmogorov’s Curious Axioms

By Stephen Busic

“Do you believe or not?” This is a familiar question for a lot of us. The rate at which we ask it is probably the surest metric of the Information Age. We have been made more and more incredulous by modern life, and for obvious reason. There is almost a Moore’s law for falsehoods. As the power of computers doubles every year and a half, so does the inaccuracy of what we read on them. At least, that’s what it can feel like. Not everything is fake news, of course. There is still trustworthy media to be found. And when reliable information contradicts what we have always believed, it can make us second guess ourselves just as much as falsehoods can. Not all our doubt is a bad outcome of technology, after all. Sometimes, when they lead us closer to truth, doubts are the internet’s most humanitarian upshot.

We are all asking, for an increasing number of things, whether we believe. But this is not the only question we ask about the claims and stories we hear, and our attitudes toward them. We also ask by how much do we believe? This concept – that belief can come in degrees, and not just all-or-nothing – is important. As we go on to explore it, it will lead us to another question and the focus of this article: What kind of belief-forming behavior is rational? More specifically, are there any simple rules we can lay down about rational belief-forming, and if so, would there be any reason to follow them? A philosophical view called “Probabilism” answers yes to both. And in our online age, it might just offer some ground-rules for keeping us rational.

Degrees of Belief

When we believe something, we take it with a kind of attitude. That is really all a belief is: a kind of attitude we have towards something whenever we regard it as true. Attitudes of this kind are called doxastic attitudes. When attributed to someone, they describe how she take the world to be. As pointed out, one feature of our feelings of belief and disbelief is that they are rarely absolute. Rather, they tend to be on a spectrum. They float somewhere between the extremes of complete certainty and complete distrust. We believe things more or less strongly than we do others, and with some math, we can describe by how much our degree of confidence – our credence – differs between the things we believe. By a “thing we believe”, I just mean a proposition that we take to be true. And as for “propositions,” or “claims,” those are just declarative sentences capable of being either true or false. “2% of Americans are redheads” and “somebody left the oven on” are both propositions. We can believe or disbelieve them with any degree of confidence. Other sentences like “Hold the door, please!” cannot be true or false, and so are not propositions. Later on, when we begin to ask what basic rules there might be about forming rational beliefs, it will be handy to have these terms down. It will also help to start speaking with numbers. So, let’s give that a try.

How can math be used to talk about degrees of confidence? Consider the proposition “tomorrow will be sunny.” Call this proposition S, and suppose a person holds that tomorrow will be sunny (S is true) with 75% confidence. Already we are making good use of math. For many of us, percentages are an everyday way of describing someone’s level of sureness or doubt. We will go a step further than this, though. A kind of mathematical function will serve our needs best – something that can represent how likely a given claim is thought be true. Let’s call this function cr. Then, we can express the person’s credence that tomorrow will be sunny (S is true) as follows:

cr(S) = 0.75

Hopefully, for the non-mathematically enthused, the flashbacks from algebra class are not too painful. This expression is nothing more than a quick way of saying “the proposition ‘tomorrow will be sunny’ is believed with 75% confidence.” Notice how this is not the same as saying everyone believes S is true with 75% degree confidence. Exactly whose credence is being described by this expression needs to be said elsewhere. Our function cr only takes a proposition as its input, and outputs a real number. This real number represents the degree of sureness that someone (in our case, a hypothetical person) has that the input proposition is true. We will call these numbers “credence values,” because they represent the credence (or, degree of belief) someone has that a given claim is true.

Realistically, the credence values that our function gives are only an approximation. Our beliefs do not come with little decimals and digits stamped onto them. Even if they did, it is unlikely we could sense the difference between a credence of, say, 0.7499999 and one of 0.75. In real life, our degrees of confidence are imprecise and fuzzy. But I argue that this fact poses little threat to the usefulness of our cr function. While credence might not be a finely measurable part of our lives, there is still some level of confidence we have in each of our beliefs. It is not silly to talk about these levels and how they might compare to each other. Numbers, when used within reason, give us a handy way of doing that.

Tenets of Probabilism

The claim I just made – that functions like cr are helpful when describing a person’s credences – happens to be one of two core tenets of a philosophical view called Probabilism. In the words of Michael Titelbaum, this first tenet of Probabilism is the principle that “Agents have doxastic attitudes that can usefully be represented by assigning real numbers to claims.”¹ An “agent”, for our purposes, is just a being who has the capacity to act, including – most relevantly to our discussion – the act of forming beliefs.

The second tenet of Probabilism goes a bit further. It is claim about what makes belief formation rational or not. It says that there are rules an agent must follow if her belief-forming behavior is to count as rational. These rules (also called rationality requirements) can be expressed mathematically, and they are applied to the real numbers that we have been using to represent degrees of belief. Titelbaum explains this second tenet as the principle that “Rational requirements on [an agent’s] doxastic attitudes can be represented by mathematical constraints on the real-number assignments closely related to the probability calculus.”¹ The “real-number assignments” he’s talking about are just credence values. So, what Titelbaum is really saying with these two tenets can be summed up like this: Probabilism answers the question “What does it mean to have rational beliefs?” by offering a list of rules (in the form of mathematical constraints), and then declaring that a necessary part of being rational is having only credence values which obey those rules. It will require us to brave some notation, but let’s have a look at the most basic of these rules Probabilists put forward.

Kolmogorov’s Axioms

There are three rules most fundamental to Probabilism. By no coincidence, they each closely mirror the three axioms of a quite young branch of mathematics called Probability Theory. These axioms are known as “Kolmogorov’s Axioms” after pioneer Soviet mathematician Andrey Kolmogorov. By swapping a few words, we can easily state Kolmogorov’s Axioms in terms of degrees of belief. Titelbuam does exactly that as follows, giving us Probabilism’s three most fundamental rules:¹

Non-negativity: For any proposition P in L, cr(P) ≥ 0.

Normality: For any tautology T in L, cr(T) = 1.

Finite Additivity: For any mutually exclusive propositions P and Q in L, cr(P or Q) = cr(P) + cr(Q).

Some of this won’t make sense at first glance. So, a quick explanation is due. Maybe the first thing to explain is the letter “L” present in each axiom. This letter represents the set of all the most basic propositions in a given language – the ones from which every possible proposition in that language can be built. This is not important for our purposes, though. Just think of L as a long list of every declarative sentence, where P is any one of them. The function cr should be a familiar friend, so I will not explain that. All we that is left is a quick description of each axiom. Here goes:

The axiom of Non-negativity: This rule requires that all rational agents believe things with degrees of confidence greater than or equal to zero. This should seem reasonable enough. What would a negative degree of confidence even mean, after all? It’s hard to see what sense that could make.

The axiom of Normality: Notice how this rule makes use of a letter “T.” This letter is being used here to represent a tautology. A tautology is a proposition which is true in every possible world. For example, the proposition “Either the candle is lit, or the candle is not lit” is a tautology. No matter what, it is true! There is no possible world in which statements of this sort are false. For that reason, they are often accused of being meaningless. The axiom of Normality says that rational agents must believe tautologies are true with 100% confidence. That sort of extreme certainty seems fair for things that can never be false, which is exactly what tautologies are.

The axiom of Finite Additivity: This rule is slightly more technical, but it can be made clear by an example: Say you hand me a flower. If I am 10% sure that what I'm holding is a rose, and 20% sure that it is a tulip, then I better be 30% sure that the flower is either a rose or a tulip. That is all the third axiom is saying. We can rephrase it more generally as: If two propositions P and Q cannot both be true (a flower cannot both be a rose and a tulip, for instance), and if a rational agent has a credence that P is true and another credence that Q is true, then the sum of those two credence values must equal her credence that either P or Q is true. Fair enough.

We should now have a working idea of Kolmogorov’s Axioms. We will move on, but just for kicks, one last thing worth quickly defining is a distribution. An intuitive way to imagine what a distribution means to a Probabilist is this: If an agent were to reflect on all the most basic propositions in a given language (that’s a lot!), and for each one assign a real number which best represents her credence that that proposition is true, then she will be performing a distribution of credence values. Distributions that only assign values which obey Kolmogorov’s axioms are known to mathematicians as “probability distributions.” With this term in hand, we can define probabilism more rigorously as the philosophical view that, for an agent to be rational, her credence values must be divvied up such that they form a probability distribution.¹

Why Probabilism?

Now that we know what Probabilism is, why buy it? Why think rational behavior requires us to follow these strange mathy rules when we form our beliefs? We will look at two arguments, and the first one is simply this: The rules imply things that match our intuitions. Titelbaum argues that Probabilism “is attractive in part because it has intuitively appealing consequences” (35).¹ Basically, the rules lead to conclusions about rational belief which just seem right. These conclusions are also what Kolmogorov called “immediate corollaries” (6) to his axioms.² He derived the first of these corollaries by starting with the fact that “P or ~P” is a tautology.

(By the way, P written by itself, as seen here, is just taken to mean “P is true.” Also, the squiggle means “not.” So ~P means “not P”, which is just the same as saying “P is false.” You can now see how “P or ~P” means only “P is either true or false”, which is a statement that is true no matter what – a tautology!)

Remember that the axiom of Normality says that for rational people, their confidence in tautologies should be 100%. That is, cr(T) should equal 1. So, with the tautology “P or ~P” in hand, Kolmogorov combined it with the axiom of Normality to show that cr(P or ~P) = 1. This is shorthand for the intuitive thought that for propositions like “Either the tea is hot, or the tea is not hot,” we should be completely sure they are true. And since ~P and P are also mutually exclusive (they cannot be both true at the same time), then from the axiom of Finite Additivity, it follows that we can split them up. Our cr(P or ~P) can become cr(P) + cr(~P). These two observations together give us the conclusion that cr(P) + cr(~P) = 1, or put another way:

Negation: cr(~P) = 1 – cr(P).

Voila! A new rule – one which follows necessarily from the three already declared. But what does it mean? Titelbaum calls this the rule of Negation and describes it as the “sensible thought that if you’re highly confident that a proposition is true, you should be dubious that its negation is” (30).¹ To revisit an earlier example, this means that if an someone is 75% sure that tomorrow will be sunny, they ought to be 25% sure that tomorrow will not be sunny.

Another corollary which follows quick on the heels of Negation is a rule about how high one’s degree of confidence can be. Remember that the axiom Non-Negativity says no rational degree of belief can be less than zero. So, it follows that neither cr(P) nor cr(~P) can be less than zero. And having just showed that for any proposition P, cr(P) + cr(~P) = 1, that also means neither cr(P) nor cr(~P) can be greater than 1. After all, if you add two positive numbers, and if one of those number is greater than 1, then the sum must also be greater than 1. But we know cr(P) + cr(~P) equals exactly 1. And for that to be true, neither cr(P) nor cr(~P) can be greater than 1, or else their sum would be also. There is another way we can say this. Recall that cr(P) + cr(~P) can be rewritten as cr(P or ~P). The expression P or ~P should look familiar – it’s a tautology. So, if cr(P or ~P) were greater than 1, that would be the same as cr(T) > 1. But that can’t be true. It violates the axiom of Normality! So, we are led to the following rule:^{1, 2}

Maximality: For any proposition P, cr(P) ≤ 1.

Or, as we would say in plain English, the most confident any rational person can be is 100%. The rule of Maximality, together with Non-Negativity, pick out the maximum and minimum amount a rational person can use to represent her credences. And while these particular values of 1 and 0 are arbitrary, they do the job and track well with our talk of percentages.¹Of course, let’s not confuse the fact that any two numbers can be used to represent the boundaries, with the mistaken idea that the boundaries themselves are made up. No matter what numbers are chosen to represent the limits, these two rules do show that according to Kolmogorov’s Axioms there are limits, and for many of us, that strikes an intuitive chord.

Conclusion

If we take Kolmogorov’s curious axioms as law for rational beliefs, as Probabilists do, then we have seen how two intuitive rules quickly spill out. The rule of Negation says that if a rational person is 90% sure that something is true, she ought to be 10% sure that it’s false. And the rule of Maximality says that the surest any rational person can be is 100%. There are other intuitive rules that follow too. But these two should give us a good enough idea for how Kolmogorov’s axioms can lead to sensible results. Maybe you found this argument for Probabilism convincing. Maybe not. Still, there are other kinds of defenses that a Probabilist will have up their sleeve. The second argument we will look at, and my personal favorite, comes in the form of a sinister gambling game.

You can find the second part of this series on the All Posts page, or right here.

¹Michael G. Titelbaum, Fundamentals of Bayesian Epistemology (Oxford University Press, forthcoming).

²Kolmogorov, A N. Foundations of the Theory of Probability. New York: Chelsea Pub. Co, 1950. Print.

³Grinstead, Charles Miller, and James Laurie Snell. Introduction to Probability. American Mathematical Society, 2006. pp. 7, 17

Jul 27, 2021

Login to comment!

Use an existing Google, Twitter, or Facebook account to comment on posts. Quick and easy!

Probabilism I – Kolmogorov’s Curious Axioms

By Stephen Busic

Login to comment!

Feel free to read my blog's privacy policy and terms of use

Comments

New posts every-so-often-ish