Doing Mathematics Differently

Chaitin, Gregory

Dieu a choisi celui qui est le plus parfait, c’est à dire celui qui est en même temps le plus simple en hypotheses, et le plus riche en phenomenes.
Mais quand une regle est fort composée, ce qui lui est conforme passe pour irregulier.
— Leibniz, Discours de métaphysique, VI¹

Between the physics of the very small (particle or high energy physics) and the physics of the very large (cosmology) lie complex systems like us. In this essay, my subject is conceptual complexity, the complexity of ideas. The main application of these concepts is in meta-mathematics, where one deals with incompleteness and the limits of pure thought.

Mathematics contains a great deal of complexity. In this area, as in so many others, Gottfried Leibniz put his finger on the basic issue.

He did so before anyone else.

The Weyl Way

Hermann Weyl was both a mathematician and a mathematical physicist. Weyl wrote on mathematics, general relativity, and quantum mechanics, as well as on art and philosophy. His smaller book on philosophy is entitled The Open World. It is made up of lectures Weyl gave in 1932 at Yale University.² In the philosophy of science, according to Weyl, complexity is essential in understanding the concept of a law of nature. If laws of nature may be arbitrarily complex, he argued, the very concept of a law becomes vacuous. What difference would remain between complex phenomena and the laws meant to explain them if the laws meant to explain them were as complex as the phenomena they are meant to explain?

Laws of nature must be simple.

A good point.

But Leibniz made the same point three centuries before.

Most of Leibniz’s work was not published during his lifetime. His ideas were found long after his death in his private papers, or in letters to other intellectuals. One text, the 1686 Discours de métaphysique, was published (and named) by an editor who found it among Leibniz’s papers more than a century after Leibniz died. Paragraphs V and VI of this work discuss complexity.³ The terms Leibniz uses are simplicité and fort composé.

Leibniz imagined taking up a quill pen and splattering a piece of paper with ink spots. These spots make up a finite set of points on the plane. What does it mean to say that they obey a law? Perhaps a mathematical equation goes through the points. But for any finite set of points, Leibniz observed, you can always find a mathematical equation going through them. Lagrangian interpolation is an obvious technique. The existence of an equation going through a set of points does not distinguish random points from points that follow a law.

What does?

For Leibniz, a law must be expressed by a simple equation. An equation that is forte composée is meaningless. There is always one loitering about. This kind of theoretical criterion does not apply to individual cases. We are trying to understand what it means to say there is a law governing these points. That does not mean we can determine if there is such a law.

Well, maybe, if you limit yourself to algebraic equations…

It was in paragraph V of the Discours that Leibniz asserted that the world is comprehensible. He wrote this in 1686, a year before Isaac Newton published the Principia. Modern science or, as it was called then, mechanical philosophy, was just beginning. Medieval ideas about the world and religion were still very much alive. Newton himself was more a man of the past than of the future; he was, as John Maynard Keynes put it, the last of the Babylonian sorcerers.⁴

Just as the old school of thought was colliding with the new mechanical philosophy, Leibniz asked why the world is comprehensible. He wanted to combine theology with modern science; he was unwilling to give up either. In the Discours, he discussed the relation of science to the world, noting that the world is governed by science, and explaining why. The reason, he wrote, is God. God has created a world which is very rich and diverse in phenomena. And yet all this richness follows from a few very simple laws. According to Leibniz, this is what it means to say that the world is comprehensible. The world is beautiful because it follows simple laws that produce the richness that we see around us. Science is possible, and this fact was, for Leibniz, proof of the greatness of God; this beautiful world is made by a great artist from a small number of powerful ideas.

The Popper Way

Weyl took up this idea in The Open World and Philosophy of Mathematics and Natural Science.⁵ He pointed out a problem: how do you measure the complexity of an equation? It might be the size of the equation, but mathematical notation changes. Do you allow hyperbolic trigonometric functions? Bessel functions? Exactly what notation do you use? According to Weyl, this concept of complexity is very important and basic; it would be nice to clarify it.

Karl Popper’s The Logic of Scientific Discovery devotes a chapter to simplicity and its opposite, complexity.⁶ Popper threw up his hands. If Weyl could not figure out how to define the fundamental notion of complexity in mathematical terms, well then, this must be a hard problem.

But the notion of conceptual complexity is more than three hundred years old.

The Old Boy had made a good start.

There is a way to take these ideas and make them sharper. There is, in fact, a way to define them mathematically. Consider this done. The result is algorithmic information theory, or AIT.

At the center of AIT is the following model of how science should be done:

Input: program → COMPUTER → Output: experimental data.

The input to the computer is a program; the output, experimental data. The computer itself is a fixed universal Turing machine.

Gone are points or dots of ink on a piece of paper; and gone with those dots is anything like a theory or mathematical equation. No interpolations; no Lagrangians. In this model, both the input and the output comprise a finite sequence of 0s and 1s. The result is a soft view of the hard science. Theories and data are both finite strings of bits. All that enters the computer is a program. There is no distinction between input data and the methods or tools used to manipulate them. The data is in the program, so that the input is rather like a class in object-oriented programming languages.

It is all in there together.

Given a theory as its input, and some internal signal to get to work, the computer initiates its calculations, and, one hopes, produces an output before halting. If the output is exactly the string of bits comprising the experimental data, then the theory serves as its explanation.

No approximations! This is not a probabilistic or a statistical view of science.

Leibniz fitted this view of things like a glove. We can compare the complexity of a theory with the complexity of the data very easily because both are finite strings. Leibniz saw that a theory the same size as the data is useless in the sense that it is not possible to cover a debt of ten dollars with another debt of ten dollars. Explanation is a form of compression. If a theory is smaller than the data, then in that case, as in so many others, less is more. A successful explanation is a matter of covering a large debt with a much smaller one.

If less is more, smaller is better.

The best theory is the smallest program that generates the data.

Incompleteness

The computer in this model is a universal Turing machine and so subject to the halting problem. AIT provides an entirely new perspective on the halting problem. Given a string of bits S, is it possible to prove that S is irreducible? Better, sharper. Given S, is it possible to prove that there exists no program P smaller than S that generates S?

No such proof is possible. This is an incompleteness result, a measure of the unexpected depth of the concept of complexity, its inner richness.

A program P is elegant if no program smaller than P produces the same output as P.⁷ There may be several elegant programs that produce the same output, and so yield an exciting tie. If we have one of these programs, can we prove that it is an elegant program?

It is very, very hard.

This, too, is an incompleteness result, and its corollary is the unsolvability of Alan Turing’s halting problem. The proof is actually very simple. In some ways, it is simpler than the original proofs of Kurt Gödel’s incompleteness theorem or of the unsolvability of the halting problem, both of which involve self-reference. I will show that in general it is impossible to prove that a program is elegant.

Taken for granted:

A formal axiomatic or formal mathematical theory A, an idea due mostly to David Hilbert, but one that can be traced back to Leibniz, and even to the mystic and philosopher, Ramon Llull, in the thirteenth century. An artificial language with crystal-clear grammar, such as first-order symbolic logic. A finite set of axioms.

Given this:

Hilbert argued that there should be a mechanical procedure for checking if a proof in A is valid or not. Objective, not subjective. Black or white. Nothing grey. In going through the tree of all possible deductions from the axioms, the mechanical procedure discards the invalid proofs: Failure of substitution—out; modus ponens backward—out; dangling parenthesis—out. This process generates all the theorems of A.

Get busy:

Start applying the rules of logic in every possible way to the axioms. Look first at proofs one statement long. Look next at proofs two steps from the axioms, three steps from the axioms, four steps from the axioms, and so on. Search through the tree of all possible proofs and all possible derivations.

In the end, there they are: all the theorems in A.

For a strong enough A, this would generate all the theorems of mathematics.

Consider a program P such that:

P’s output is the output of the first provably elegant program Q larger than P. If P needs to know anything at all, all that P needs to know is its own size in bits. This is much simpler than Gödel’s mind-numbing form of self-reference. P consists of a small main program with a large subroutine that knows the formal axiomatic theory A and sets it working. P only cares about theorems that prove that individual programs Q are elegant. It is single-minded. If it finds a proof of the Riemann hypothesis, it will not cherish it.

P keeps on until it finds the first proof in A that an elegant program Q is larger than P itself. Having found Q, P then runs Q, remembers its output, and then generates the same output as Q.

But this is impossible, by the definition of elegance! P is smaller than Q, and Q is an elegant program. Because Q is elegant, it must be the smallest program generating its output. Either P is not smaller than Q, or it cannot generate the same output as Q.

The only way to avoid this contradiction is for P never to find Q. The embrace would be logically fatal.

We have just seen a very simple proof of incompleteness. Incidentally, a corollary is that there cannot be an algorithm to determine whether a program halts or not, because if you had a general method to decide whether a program halts, then you could use it to decide if an arbitrary program is elegant. All you would have to do is look at all programs Q’s size or less, decide which ones halt, and ignore the ones that do not halt. You run the ones that halt and see what they produce. If no program smaller than Q produces the same output that Q does, Q is elegant.

And yet I have just shown that you cannot do that.

This gives us a way to measure the conceptual complexity of a formal mathematical theory A. I view a mathematical theory as software—that is, as a program that runs through the tree of all possible derivations from the axioms and mechanically produces all the theorems in A, which is an endless computation. The number of bits in the program that does this is the complexity of the formal axiomatic theory A. Roughly speaking, it is the number of bits of axioms in A, and it is also just about the size in bits of the proof-checking algorithm for A—sort of a natural measure.

So we have a measure of the complexity of a formal mathematical theory A, and in theory A you cannot prove that any program is elegant that is larger in size than A’s complexity. That is what the paradoxical program P proves.

What Lurks in the Hearts of Mathematicians

Gödel would probably have been surprised by this formulation. It is an incompleteness theorem, but the source of incompleteness is rather different from his. Gödel does not talk about complexity. His unprovable assertion is, “I am unprovable,” which is true if and only if unprovable. We have discovered a different flavor of incompleteness.

Why does this matter? Why do we need a different proof of incompleteness? Because it is easy for mathematicians to ignore Gödel’s proof. What lurks in their heart of hearts is a commitment to absolute truth, and a universal formal axiomatic theory for all of mathematics.

In 1931, Gödel exploded this belief. The assertion, “I am unprovable,” may be true and unprovable, but it is also bizarre. This does not look at all like ordinary mathematics. I do not know whether proving that programs are elegant looks like ordinary mathematics. Mathematicians can ignore the proof that I just presented as well.

What AIT does, however, is to introduce a notion of complexity into the discussion of incompleteness. With Gödel’s original approach you could not tell whether incompleteness was ubiquitous, commonplace, or restricted only to strange, degenerate cases.

AIT suggests that incompleteness is, in fact, pervasive; it tells us that every mathematical theory has finite complexity, but that the world of pure mathematics has infinite complexity. Just proving that programs are elegant for all possible programs would require infinite complexity.

AIT makes incompleteness look natural.⁸

This new perspective is, in my opinion, the most important application of AIT.

AIT gives you a different kind of proof of the unsolvability of the halting problem. It offers a conceptual complexity or complexity of ideas perspective on both Gödel’s incompleteness theorem and Turing’s proof of the unsolvability of the halting problem. Things look different if you use the concept of complexity.

An even more dramatic application of AIT is the halting probability. To illustrate, let us now reexamine Turing’s halting problem using a statistical mechanics or ensemble approach.

Turing showed in 1936 that there cannot be an algorithm to decide if an individual, self-contained program will ever stop. You can run the program, and if it stops, you know. The problem is proving that a program that does not stop, never stops.

Turing deduced incompleteness as a corollary in his original 1936 paper. If there cannot be a mechanical procedure to decide in advance if a program will halt or not, you cannot have a formal mathematical theory that will always enable you to decide for individual programs whether they halt or not. Running through the tree of all possible proofs, slow as this would be, is itself a mechanical procedure that decides if a program halts or not. Either you find a proof that the program in question halts, or you find a proof that it never does.

With the halting probability Ω, instead of asking whether an individual program halts or not, you look at the ensemble of all possible computer programs and ask what the probability is that a computer program chosen at random will halt.

If no program halts, the probability is zero. If every program halts, it is one. Using a real computer or universal Turing machine, some programs will and some will not; the halting probability will be strictly between zero and one.

Formally, programs are bit strings, where each bit in the program is generated by an independent toss of a fair coin. If a K-bit program halts, it contributes 1/2^K to the halting probability. The halting probability Ω is as follows:

$Ω = \sum_{p halts} 2^{- | p |}$

This is a real number between zero and one:

$0 < Ω < 1$

The p in the formula for Ω is an arbitrary computer program, and you are summing over all programs that halt. The absolute value |p| of the bit string p stands for p’s size in bits.

Of course, to ask what the probability is that a program will halt, you have to put a probability measure on the space of all possible programs. The probability of p is 2^–|p|, but there is a problem here. Summing this normally gives you an Ω that diverges to infinity. It is an un-normed probability measure.

It should not be greater than one.

The solution is to require programs to be self-delimiting, in the sense that no extension of a valid program is a valid program.⁹

This makes the infinite sum

$Ω = \sum_{p halts} 2^{- | p |}$

converge to less than one. The measure is bounded and the halting probability Ω, well defined. Its precise value depends on the universal Turing machine or programming language.

In a weak sense, you can calculate Ω in the limit from below. As you look at larger and larger programs, and you run them for more and more time, you get better and better lower bounds on Ω. The upper bound is one. Still, while Ω is well defined mathematically, it is also violently uncomputable. Convergence is monstrously slow. It is so slow that you never know how close you are, or how many bits are correct in the base-two notation for the sum. There is no computable regulator of convergence.¹⁰

The halting probability Ω shows that there is an infinite amount of irreducible complexity in pure mathematics. How does this work? First, imagine taking Ω and writing it out in binary.

$Ω = .01101 \dots$

To get a precise value, you have to choose a universal Turing machine or programming language. Depending on your choice, the numerical value of Ω may be different, but its paradoxical properties will be the same.

The numerical value of this halting probability will be strictly between zero and one; it will be an infinite number of bits, after the binary point, with no integer part.

It turns out that each of these bits—as far as we can tell—looks like an independent toss of a fair coin. They are algorithmically random. You can prove that Ω is a transcendental real number; if it were an algebraic number it would be highly compressible and not at all irreducible. The number Ω has infinite complexity. Each bit of Ω is one bit of complexity. There is no redundancy.

It is irreducible.

To repeat, even though Ω is well-defined mathematically, its numerical value is elusive. Given human limitations, these bits are a perfect simulation of independent tosses of a fair coin in pure mathematics.

Not only is Ω uncomputable—Turing in 1936 already had an uncomputable real number—it is uncomputable in the worst possible way. The smallest program that calculates the first N of bits of the base-two expansion of Ω will also have N bits. That is precisely what it means to say Ω is irreducible. And the smallest axioms from which you can deduce what those bits are instead of computing them will also have to have N bits.

Both computationally and logically, Ω is irreducible.

As long as the initial bits are all 1s, you can prove that they are 1s by getting better and better lower bounds on Ω. So the initial string of 1s at the beginning of Ω cannot be too long. These are fragile facts that depend on your choice of programming language.

Leibniz had a principle called the principle of sufficient reason, which says that if something is true, it is true for a reason. If something is true in pure mathematics, the reason is called a proof. Mathematicians find proofs; their job is to find the reason something is true.

However, Ω is a specific real number, and each of the bits that make it up has to be a mathematically determined 0 or a 1. There is no freedom. But each bit is so delicately balanced between 0 and 1 that we will never know which it is. In God’s mind, or in the Platonic world of ideas, each bit is determined. But for us, there is no reason why each bit is what it is.

The simplest reason that these bits are what they are is the bits themselves. There is no more concise axiom from which you can deduce this. It is a perfect simulation, in pure mathematics, of contingency.

Leibniz distinguished between necessary truths, which are true for a reason—i.e, their opposite is a contradiction—and contingent truths, such as the fact that the president of France is François Hollande. A contingent truth cannot be proved logically or mathematically; it is accidental, or historical. (Actually, according to Leibniz, the chain of reasons for a contingent truth is infinite and can only be perceived by God.)

Even though the bits of Ω exist in the world of pure thought, they are a perfect simulation of complete randomness. The bits of Ω look like contingent truths. They look like a typical outcome of independent tosses of a fair coin.¹¹

Hilbert thought pure mathematics came from a small number of ideas that we could all agree on; that is why pure mathematicians think the subject is so beautiful. But Ω shows that pure mathematics is not simple; it is infinitely complex.

It makes mathematics look a lot like biology.

Empirical Mathematics

How has the mathematics community reacted? With Gödel, at first there was a lot of shock. As a student in the late 1950s and early 1960s, I would read essays by Weyl, by John von Neumann, and by other mathematicians. Gödel really disturbed them. He took away their belief in the Platonic world of ideas, the principle that mathematical truth is black or white and provides absolute certainty.

But now, strangely enough, the mathematics community ignores Gödel incompleteness and goes on exactly as before, in what I would call a Hilbertian spirit, or following the Bourbaki tradition, the French school inspired by Hilbert. Formal axiomatic theories are still the official religion in the mathematics community.

As for the work on Ω, the logic community thinks it is crazy. They do not understand conceptual complexity, which is akin to entropy, and they have no feeling at all for randomness, which has a long history in physics, but not in logic. To them, these are very strange ideas. Logicians hate randomness.

That is precisely why they became logicians.

Gödel incompleteness is even unpopular among logicians. They are ambivalent. On the one hand, Gödel is the most famous logician ever. But, on the other hand, the incompleteness theorem says that logic is a failure. No logician would like the message from a course on logic to be that they should fire all the logic professors! Books on logic may talk about incompleteness, but they do so in the last chapter, and you never get to the last chapter when you give a course!

The ideas discussed in this essay—randomness and lack of structure—are very close to ideas from physics, such as entropy and disorder in statistical mechanics. To physicists, these ideas seem familiar and comfortable. What the work discussed here really shows is that pure mathematics may not be identical to theoretical physics, but it is not that different. Pure mathematicians like to think that they have absolute truth, and that we physicists—I consider myself an honorary physicist—use proofs that are heuristic and non-rigorous. But even in pure mathematics there are wonderful non-rigorous heuristic proofs of the kind that mathematical physicists feel comfortable with. Leonhard Euler used them all the time!¹²

My feeling about all of this is that it is not a sharp divide; it is a continuum. Absolute truth you can only approach asymptotically in the limit from below.

What if we take Gödel incompleteness very seriously and throw away rigor? Suppose you have a property of the prime numbers which has been checked on the computer. You graph it and there is a beautiful curve, and it is fit beautifully by a very simple equation. What if you cannot prove it? A physicist would publish anyway. But a pure mathematician does not care how much empirical evidence there is, or how accurately this simple formula fits the curve. You need a proof!

Surprisingly, a few mathematicians have been promoting what they call experimental mathematics.¹³ AIT provides some theoretical grounds for thinking that maybe pure mathematics is what I call, following Imre Lakatos, quasi-empirical.¹⁴

In theoretical computer science, there are cases where people behave like physicists; they use unproved hypotheses. P ≠ NP is one example; it is unproved but widely believed by people who study time complexity. Another example: in axiomatic set theory, the axiom of projective determinacy is now being added to the usual axioms.¹⁵ And in theoretical mathematical cryptography, the use of unproved hypotheses is rife. Cryptosystems are of immense practical importance, but as far as I know it has never been possible to prove that a system is secure without employing unproved hypotheses. Proofs are based on unproved hypotheses that the community currently agrees on, but which could, theoretically, be refuted at any moment. These vary as a function of time, just as in physics.

P ≠ NP, projective determinacy, cryptography, these are all examples of mathematics being done in a more empirical spirit, as if it were physics. Researchers add new axioms which are pragmatically justified, because they help to organize the field, they help us to compress our mathematical experience, but they are not the simple, self-evident axioms that mathematicians normally like.

So physics and mathematics are not that different. Starting with Leibniz’s Discours we are led inevitably to this conclusion. Three full centuries after his death, Leibniz is more relevant than ever.¹⁶

In honor of Leibniz on the tercentenary of his death (1716–2016)

Doing Mathematics Differently

Letters to the Editors

The Weyl Way

The Popper Way

Incompleteness

What Lurks in the Hearts of Mathematicians

Empirical Mathematics

More from this Contributor

More on Mathematics