Cantor’s Diagonalization Method

Kharazishvili, Alexander

Among various philosophical categories, the finite and the infinite are of pivotal importance. The relationship between them is so close, sophisticated, and profound that neither can be perceived and studied without appealing to the other. Both notions are the subject of extensive mathematical research. Modern set theory is entirely devoted to the study of the most general properties of finite and infinite families of objects.

A set of objects is finite if there exists a one-to-one correspondence, or bijection, between the set and some natural number n. It is assumed that n is identified with the set of all strictly smaller natural numbers. It follows that 0 is identical to the empty set ∅, and 1 coincides with the one-element set {∅}, while 2 is equal to {∅,{∅}}, and so on. If a set of objects is not finite, it is infinite. For example, it is a straightforward to demonstrate that the set N of all natural numbers is infinite.

Galileo offered some useful remarks about the paradoxical nature of infinite sets in the seventeenth century.¹ There is a one-to-one correspondence between the natural numbers and their squares. N is equinumerous with a proper part of itself. This is impossible for a finite set of objects.

Bernard Bolzano suggested defining infinite sets as those sets that are equinumerous with their proper subsets.² This is the definition adopted by Richard Dedekind.³ All sets satisfying this definition are now known as infinite in the Dedekind sense: D-infinite sets.⁴ All sets which are not D-infinite are now known as finite in the Dedekind sense: D-finite sets. D-finite sets are not quite identical to finite sets. Every finite set is D-finite, but the axiom of choice is needed to establish the converse.⁵ Even weak versions of the axiom of choice, it should be noted, are extremely non-constructive.⁶

The introduction of more or less rigorous concepts of infinity was deeply connected with the development of real mathematical analysis in the second part of the nineteenth century by Georg Cantor, Dedekind, Charles Méray, and Karl Weierstrass. Their definitions of the real numbers turned out to be equivalent. The classical theory of Dedekind cuts is now embedded in the theory of Galois connections.⁷ Cantor’s construction of the real numbers is now recognized as a prototype of completion procedures in arbitrary metric spaces.⁸ Naïve set theory emerged in the second part of the nineteenth century almost entirely as the result of work by Cantor.

No one, David Hilbert affirmed, can expel us from the paradise that Cantor created.⁹

Hilbert’s assertion was optimistic. Delicate and difficult moments followed the axiomatization of naïve set theory by Ernst Zermelo and Abraham Fraenkel.¹⁰ The role of axiomatic Zermelo–Fraenkel (ZF) set theory as a basis for all of mathematics was itself subject to debate. Today, category theorists argue for the replacement of set theory by category theory.¹¹

The Diagonal Argument

Cantor’s great achievement was his ingenious classification of infinite sets by means of their cardinalities. He defined ordinal numbers as order types of well-ordered sets, generalized the principle of mathematical induction, and extended it to the principle of transfinite induction.

Cantor also created the diagonal argument, which he applied with extraordinary success. Consider any two families of sets {X_i : i ∈ I} and {Y_i : i ∈ I}, both indexed by some set of indices, and suppose that X_i ≠ X_j whenever i ≠ j. In the family of all pairs of the form (X_i,Y_j), consider the diagonal consisting of all pairs (X_i,Y_i), where i ranges over I, and define a subset Z of {X_i : i ∈ I} by stipulating:

X_i ∈ Z if and only if X_i ∉ Y_i.

It immediately follows that Z cannot be identical with any Y_i, since the relations X_i ∈ Y_i and X_i ∉ Y_i are in conflict.

Some straightforward consequences follow.

First, Cantor’s celebrated theorem (1891) demonstrates that there is no surjection from any set X onto the family of its subsets, the power set P(X). The proof is straight forward. Take I = X, and consider the two families {x_x : x ∈ X} and {Y_x : x ∈ X}, where each Y_x is a subset of X. The subset Z of X produced by diagonalization for these two families differs from all sets Y_x(x ∈ X), so the equality {Y_x : x ∈ X} = P(X) is impossible.

If there is no surjection from a set A onto a nonempty set B, then there is no injection from B into A. It follows that there is no injection from P(X) into X. Since there is an embedding f of X into P(X), given by the formula f(x) = {x} for each x ∈ X, X is equinumerous with a certain part of P(X). The reverse is not true. Whence Cantor’s remarkable inequality card(X) < card(P(X)), where card(X) is the cardinality of X, and card(P(X)) is the cardinality of P(X). This inequality implies an infinite series of inequalities:

card(X) < card(P(X)) < card(P(P(X))) < card(P(P(P(X)))) < …

The hierarchy of infinite sets can be continued. Starting with the set N of all natural numbers, one gets

X₀ = N, X₁ = P(N), X₂ = P(P(N)), X₃ = P(P(P(N))), … .

Taking the union of the family {X_n : n ∈ N} yields the set X = ∪{X_n : n ∈ N}, whose cardinality is strictly greater than the cardinality of any set X_n(n ∈ N). Starting with the set X instead of N, and successively iterating the operations one then arrives at sets with larger and larger cardinalities. Why not a set whose cardinality exceeds the cardinality of all sets obtained in this way? The question leads unavoidably to the concept of a strongly inaccessible cardinal whose existence is not provable within the framework of formal set theory. Why not, for that matter, infinities of an even higher order? Gefragt, getan. The result is the theory of large cardinals.¹²

Second, there is no universal set U satisfying the relation X ∈ U for every set X. This is known as Cantor’s paradox. This follows from a direct diagonalization argument. Suppose that such a U does exist. Take

I = U, {X_i : i ∈ I} = {x_x : x ∈ U},{Y_i : i ∈ I} = {y_y : y ∈ U},

and apply the diagonalization method to {X_i : i ∈ I} and {Y_i : i ∈ I}. The set Z that is produced contradicts the universality of U.

U comprises a very big class of sets, so big, in fact, that it is a proper class of sets. The notion of a class is substantially more general than the notion of a set. Classes are more or less irrelevant for traditional fields of mathematics, which are restricted to local sets. Assuming the existence of a strongly inaccessible cardinal number, one may get an analogue of a universal set. This is sometimes useful for certain types of arguments, especially in abstract set theory, category theory, universal algebra, and general and algebraic topology.¹³

Third, there is no set R whose elements are precisely those sets X which satisfy X ∉ X. Suppose to the contrary that such a set R does exist and take

I = R, {X_i : i ∈ I} = {x_x : x ∈ R}, {Y_i : i ∈ I} = {y_y : y ∈ R}.

Applying the diagonalization method to {X_i : i ∈ I} and {Y_i : i ∈ I} yields a subset Z of {X_i : i ∈ I} differing from all sets Y_i(i ∈ I). On the other hand, the relation Z ∈ Z implies that Z = X_i for some index i ∈ I. Consequently, Z ∉ Z, is a contradiction.

This is a paradox. It was first noticed by Bertrand Russell in 1901. It shows that if a general property S(X) of various sets X is given, one cannot always assert that there exists a set {X : S(X)} consisting of exactly those sets X which satisfy S(X). Every unary relation S(X) of a variable set X, it is worth noting, determines a unique class C_S = {X : S(X)}, whose elements are precisely all those sets X for which S(X) holds true. Russell’s collection R is a proper class of sets.

An appreciation of these philosophically interesting results requires virtually no mathematics. Nikolai Luzin characterizes the diagonalization method as an extremely powerful tool for obtaining substantially new mathematical objects.¹⁴

Transcendental Numbers

Consider the unit segment [0,1] on the real line R. The points of [0,1] can be identified with the corresponding characteristic functions of subsets of N. The special case of Cantor’s inequality

card(N) < card(P(N))

leads directly to the uncountability of [0,1] and, therefore, to the uncountability of R. Another way to obtain the same result is to construct an infinite sequence of nested line segments so that the n-th segment does not contain the n-th member of a given infinite sequence of points of R. This is an argument first used by Cantor in 1873. The completeness of R yields a point distinct from all members of the sequence.

A remarkable corollary is the existence of many transcendental real numbers. A real number is algebraic if it is a root of some non-zero polynomial with rational coefficients; it is transcendental if it is not a root of any nonzero rational polynomial. The family of all such polynomials is countable, and the total number of distinct roots of each such polynomial does not exceed the degree of the polynomial. The family of all algebraic numbers is thus equinumerous with N, while the family of all transcendental numbers is equinumerous with R.

There are many more transcendental than algebraic numbers.

Concrete examples of transcendental numbers had been found prior to the work of Cantor. In 1844, Joseph Liouville established that $\sum$ {(1/10)^n! : n ∈ N} is a transcendental number. A weakness of Cantor’s method is that it will not determine an individual representative of the transcendental numbers. But, slightly revised and modified, it does allow for an explicit example. For this purpose, let us constructively enumerate all nonzero polynomials with integer coefficients, and denote the resulting sequence as {p_n : n ∈ N}. Let k(n) stand for the degree of the polynomial p_n. Start with p₀ and divide [0,1] into 2k(0) + 1 pairwise congruent sub-segments. At least one of those smaller sub-segments will not contain any root of p₀.

How then to find the first of these sub-segments? The Sturm algorithm solves the problem of finding the total number of real roots of p₀which lie on any segment of R with rational end-points.¹⁵ Denote the required sub-segment of [0,1] by $∆$ ₀, then divide $∆$ ₀ into 2k(1) + 1 pairwise congruent sub-segments and apply again the Sturm algorithm for finding a sub-segment $∆$ ₁ of $∆$ ₀ which is free from all roots of p₁, and so on. Continuing this process leads to the infinite sequence $∆$ ₀, $∆$ ₁, … , $∆$ _n, … of nested line segments. It becomes clear that the unique point belonging to all of these segments presents a concrete transcendental number.

A generalized version of Sturm’s algorithm leads to a remarkable theorem formulated by Alfred Tarski. Elementary algebra is decidable. There is an algorithm that determines whether any given statement of the theory is a theorem.¹⁶

Cantor’s diagonalization argument establishes that there exists a definable mapping H from the set R^N into R, such that, for any real sequence {t_n : n ∈ N}, the value H({t_n : n ∈ N}) differs from all t_n(n ∈ N). But, the existence of an analogous definable mapping G from the family of all countable subsets of R into R, cannot be established within Zermelo-Fraenkel theory. The existence of G implies the existence of a subset of R having cardinality ω₁, where ω₁ denotes the least uncountable ordinal number. But in ZF theory it is impossible to establish the relation ω₁ ≤ card(R).

The existence of a subset of R of cardinality ω₁ implies the existence of a Lebesgue nonmeasurable subset of R. This is a result established by Saharon Shelah and Jean Raisonnier.¹⁷ Cantor’s famous continuum hypothesis asserts that card(R) = ω₁.¹⁸ The continuum hypothesis cannot be proved or disproved within the formal ZFC theory. This we know as a result of work by Kurt Gödel and Paul Cohen.

The Growth of Functions

If f is a function acting from N into N, it is not difficult to describe another function g from N into N whose values are greater than the corresponding values of f. We can easily choose g so that the inequality f(n) < g(n) would be satisfied for every natural number n. In that case, g strictly majorizes f. Ditto for any finite family {f₀, f₁, … , f_k} of functions acting from N into N. We always can find a function g which strictly majorizes all f₀, f₁, … , f_k. There is no guarantee that there exists a single function g strictly majorizing all f_k(k ∈ N).

If one follows Paul du Bois-Reymond and compares the functions according to their growth, then an appropriate version of the diagonalization method leads to a far-reaching theory.¹⁹

For any two functions f and g, each of which acts from N into N, let us say that g is of higher growth than f if lim(g(n)/f(n)) = +∞ as n tends to infinity. The growth of g strictly majorizes the growth of f.

Suppose that we are given an arbitrary countable family {f₀, f₁, … , f_k, …} of functions acting from N into N subject to the constraint that there is a function g : N → N that tends to +∞ much faster than any function f_k(k ∈ N). It suffices that

g(n) = (n + 1)max{f_k(i) + 1 : k < n + 1, i <n + 1} (n ∈ N).

A straightforward verification shows that the growth of such a g strongly majorizes the growth of each given function f_k(k ∈ N).

These considerations can be extended to an arbitrary sequence of functions f₀, f₁, … , f_k, … acting from the positive half-line [0,+∞] into itself. Then

g(x) = (n + 1)max{f_k(x) + 1 : k < n + 1},

whenever a real x satisfies n ≤ x < n + 1. From the definition of g, one readily gets, for any k ∈ N, that lim(g(x)/f_k(x)) = +∞ as a variable x ∈ [0,+∞] tends to +∞.

The growth of g is higher than the growth of any function f_k(k ∈ N).

In some sense, du Bois-Reymond may be regarded as one of the creators of the diagonalization method.

Together, of course, with Cantor.

Descriptive Set Theory

In contrast to general set theory, descriptive set theory is concerned with those sets of real numbers, or those families of real-valued functions, that are definable in ZF by concrete properties, or are obtained by using constructive processes.²⁰ Whatever exists in descriptive set theory must be exhibited.

René-Louis Baire took the first steps in this direction.²¹ Starting with the family B₀ of all continuous functions on the unit segment [0,1] with values in the same segment [0,1], and using some transfinite process up to ω₁, Baire defined a class of functions as follows. Considering a non-zero ordinal α < ω₁ and assuming that the classes B_β of functions are determined for all ordinals β < α, he took B_α as the class of all pointwise limits of sequences of functions belonging to ∪{B_β : β < α}. He then put B = ∪{B_α : α < ω₁}. The functions belonging to B are, in fact, Borel functions, named after Émile Borel, that act from [0,1] into [0,1]. If α < ω₁, then any function from the difference B_α \ (∪{B_β : β < α}) is a function of order α.

Are there functions of order α for any α < ω₁? By using a clever diagonalization argument, Henri Lebesgue was able to give a positive answer.²² Lebesgue also enriched the diagonalization method by introducing the new and fruitful idea of a universal function for a given class of functions. He first proved that, for every ordinal α < ω₁, there exists a mapping F_α : [0,1]² → [0,1] satisfying:

F_α is a Borel function of two variables.
For each function f ∈ B_α, there is a point t from [0,1] such that f = F_α(·, t).

To obtain this preliminary result, Lebesgue utilizes the method of transfinite induction and defines F_α by a certain transfinite process. For each α < ω₁, Lebesgue presents a parameterization of all functions from the family B_α. Supposing that there are no functions of order B_α, Lebesgue introduces the Borel mapping G_α : [0,1] × [0,1] → [0,1] by the formula:

G_α(x, t) = $\lim_{n \to \infty}$ (nF_α(x, t))/(1 + nF_α(x, t)) (x ∈ [0,1], t ∈ [0,1]).

Verification that:

ran(G_α) = {0,1},
for any Borel function g : [0,1] → {0,1}, there exists a point t ∈ [0,1] such that g = G_α(·, t),

is easy enough.

Now, apply a diagonalization argument and put h(x) = 1 - G_α(x, x) for each x ∈ [0,1]. The function h is Borel and ran(h) = {0,1}. Keeping 2. in mind, pick a point t from [0,1] for which h = G_α(·, t). Then the equalities

h(t) = 1 - G_α(t, t) = G_α(t, t), G_α(t, t) = 1/2

follow, contradicting 1. Voilà, the existence of Baire functions of any order α < ω₁ and, consequently, the existence of Borel sets in [0,1] of the same order α.

A further extension of the Borel subsets of R are the so-called analytic sets in R which are also definable within ZF theory. A subset of R is analytic, Mikhail Suslin argued, if it is either empty or is a continuous image of the set I of all irrational numbers. Developing Lebesgue’s idea, Luzin then proved that there exists an analytic set A in the plane R², such that the vertical sections of A constitute the family of all analytic subsets of R. A may be regarded as a universal analytic set in the plane. A simple diagonal argument shows that A itself is a non-Borel subset of the plane, and that there is also a non-Borel analytic set in R.²³

Lebesgue was the first to discover a non-Borel analytic subset of R,²⁴ but he remained unaware of its significance. Some years later, it was shown that a certain component of Lebesgue’s partition of R is a proper analytic set.²⁵ There are many interesting concrete sets of functions that are analytic but not Borel. The set of all continuous real-valued functions, defined on [0,1], and differentiable at some point of [0,1], is a non-Borel analytic subset of the Banach space of all continuous real-valued functions on [0,1].²⁶ Many other such sets can be constructed in the theory of trigonometric series.²⁷

The diagonalization method is also effective when dealing with the projective subsets of R. Their structure is substantially more complicated than the structure of analytic sets.²⁸ An obvious diagonal argument leads to the conclusion that there is no projective subset of the plane that is universal for the family of all projective subsets of R. Although the projective sets are definable without using the axiom of choice, their Lebesgue measurability cannot be demonstrated within ZFC.²⁹

Totally Discontinuous Functions

Every Lebesgue measurable function possesses the property that its restriction to a certain subset of R of the cardinality of the continuum is necessarily continuous. By well-ordering of the continuum, Wacław Sierpinski and Antoni Zygmund were able to use a diagonal argument to the existence of an utterly discontinuous function. There is a function f : R→ R such that for each subset X of R having the power of the continuum, the function f|X, the restriction of f to X, is not continuous on X.³⁰

The family of Sierpinski–Zygmund functions is large within the family of all functions from R into R. Any function h : R → R admits a representation in the form h = f₁ + f₂, where both f₁ and f₂ are injective Sierpinski–Zygmund functions.³¹ Consequently, the family of all injective Sierpinski–Zygmund functions is equinumerous with the family of all functions acting from R into itself.³²

A similar argument shows that there is a function g : R → R satisfying a much stronger condition: for each subset X of R having the cardinality of the continuum, g|X is not Borel on X. For every partial Borel function, φ : R → R defined on an uncountable Borel subset of R, the set {x ∈ R : φ(x) = g(x)} has cardinality strictly less than the cardinality of the continuum.³³

Does there exist a function f : R → R such that, for any uncountable subset Y of R, f|Y is not continuous? Good question. Given the continuum hypothesis, any Sierpinski–Zygmund function yields a positive answer. Nevertheless, the question cannot be resolved within ZFC. Mike Shinoda demonstrated that Martin’s axiom, together with the negation of the continuum hypothesis, implies that if h : R → R, then, for each uncountable set Y in R, there exists an uncountable set Z ⊂ Y such that h|Z is continuous.³⁴

Enumerable and Recursive Sets

Say that f is a function acting from a subset of N into N. f is partial recursive, or computable, if there is an algorithm determining f(x) for any element x ∈ dom(f). If, dom(f) = N, f is a total recursive function. Partial recursive functions from N^k → N^r can be defined in the same way. A set X ⊂ N^k is recursively enumerable if X coincides with the range of some partial recursive function.

Every recursively enumerable set can be described by a formula of formal arithmetic. If X ⊂ N is a recursively enumerable set, then there always exists an arithmetical formula S(x) of one free variable x, such that ∀x(x ∈ X ⇔ S(x)). Yuri Matiyasevich demonstrated that recursively enumerable sets are identical with the diophantine sets, whose nature is purely number-theoretical.³⁵

This is a remarkable result.

A set Y ⊂ N is recursive, or recursively decidable, if there is an algorithm that for every y ∈ N determines whether y belongs to Y. By the same token, Y ⊂ N is recursive if, and only if, its characteristic function f_Y is a total recursive function. It is not difficult to verify that:³⁶

The composition of any two partial recursive functions is again a partial recursive function.
If f : N → N and g : N → N are any two partial recursive functions, then their product function

(n, m) → (f(n), g(m))(n ∈ dom(f), m ∈ dom(g)),

is also a partial recursive function.

Under a partial recursive function, the image of a recursively enumerable set is again a recursively enumerable set.
A partial function is a partial recursive function if, and only if, its graph is a recursively enumerable set.
Every recursive set is recursively enumerable.
A recursively enumerable set X ⊂ N is recursive if, and only if, its complement N\X is recursively enumerable.

That there exists a universal partial recursive function, h : N × N → N, for the class of all partial recursive functions is a very deep result. Applying the diagonal argument to h, i.e., defining g by

g(n) = h(n, n) + 1 ((n, n) ∈ dom(h)),

one easily infers that the obtained partial recursive function g does not admit an extension to a total recursive function. In addition, the set dom(g) turns out to be recursively enumerable, but not recursive. This is closely connected with Gödel’s first incompleteness theorem. By means of an astonishingly ingenious diagonal argument, Gödel established that in any recursive formal system rich enough to generate arithmetic, there must exist a sentence such that neither the sentence, nor its negation, can be derived from the axioms. The set of arithmetic truths is neither recursive, nor recursively enumerable.

There is a vivid analogy between classical descriptive set theory and recursion theory. Borel sets are similar to recursive sets, and Suslin sets, to recursively enumerable sets. Consider these striking analogies:

(a) The family of all Borel sets in R forms a σ-algebra of subsets of R.

(a’) The family of all recursive sets in N forms an algebra of subsets of N.

(b) Every uncountable analytic set contains an uncountable Borel set and, in fact, contains a set homeomorphic to Cantor’s discontinuum.

(b’) Every infinite recursively enumerable set contains an infinite recursive set.

(c) There is an analytic set in R whose complement is not analytic; or, equivalently, there is a non-Borel analytic set in R.

(c’) There is a recursively enumerable set in N whose complement is not recursively enumerable, or, equivalently, a non-recursive enumerable set in N.

(d) A set X ⊂ R is analytic if, and only if, X is the projection of some Borel subset of R × R.

(d’) A set Y ⊂ N is recursively enumerable if, and only if, Y is the projection of some recursive subset of N × N.

Tarski and Gödel

Consider again formal arithmetic. Let k be a natural number and Z a subset of N^k. Z is arithmetically definable if there is a formula S(x₁, x₂, …, x_k), with free variables x₁, x₂, …, x_k, such that

(n₁, n₂, …, n_k) ∈ Z ⇔ S(n₁, n₂, …, n_k)

for all natural numbers n₁, n₂, …, n_k. If k and r are natural numbers and f is a partial function acting from N^k into N^r, then f is a partial arithmetic function if the graph of f is an arithmetically definable subset of N^k+r. The family of all arithmetically definable sets satisfies a number of obvious relations.³⁷ Arithmetically definable sets are crucial for Tarski’s undefinability theorem.³⁸

A mathematical theory T has both a syntactic structure and a semantic interpretation. Syntax describes the provable sentences of T; semantics describes its possible models. The standard model of formal arithmetic is the set N of all natural numbers with its canonical structure. There are many nonstandard models of the same formal system.³⁹

The alphabet of formal arithmetic is countable; there is an algorithm that decides, for any finite sequence of formulas of formal arithmetic, whether the sequence is a proof in formal arithmetic. In a triumph of ingenuity, Gödel showed how the elements and sequences of elements in any countable formal system could be canonically mapped to the natural numbers, a scheme that is now called Gödel’s numbering. All formulas and proofs of formal arithmetic admit appropriate enumerations.⁴⁰

Let us denote by Tr the set of all natural numbers that correspond to those sentences true in the standard model of formal arithmetic. Tarski’s celebrated theorem states that Tr is not an arithmetical set. It suffices to check that the set F of all natural numbers corresponding to sentences that are false in formal arithmetic is not arithmetical. The proof proceeds by contradiction. Suppose that F is, in fact, arithmetical. Now consider a recursive enumeration A_n(x)(n ∈ N) of all formulas with exactly one free variable x. Consider next the set {(n, m) : A_n(m) is false}. By the properties of arithmetical sets it follows that this set is arithmetical. But the set {n : A_n(n) is false} is obviously arithmetical as well. Consequently, there is a natural number k such that

A_k(n) ⇔ A_n(n) is false

is true for any natural number n. Finally, taking n = k,

A_k(k) ⇔ A_k(k) is false,

which is a contradiction and yields Tarski’s undefinability theorem. The last formula expresses the liar’s paradox, because the sentence A_k(k) asserts that A_k(k) is false.

Tarski’s theorem leads to Gödel’s first incompleteness theorem. Let Pr denote the set of natural numbers corresponding to the provable sentences in formal arithmetic. Pr is recursively enumerable. Since any recursively enumerable set of integers is arithmetical, Pr cannot coincide with Tr. It can only be a proper subset of Tr. Truth and provability are not the same. That this discovery was revolutionary hardly does justice to its significance. Thus, there is a sentence S that is true, but not provable, in the standard model of formal arithmetic. Neither S, nor its negation, ¬S, is provable and formal arithmetic is incomplete. A similar argument is applicable to any recursive mathematical theory that is stronger than formal arithmetic (e.g., ZFC set theory).

In some sense, it is mathematics itself that is incomplete.⁴¹