No finite set is bigger than everything, and so every finite set is smaller than something. If one object is added to a small finite set, it gets bigger, but stays small.
Wait a minute. Could that be right?
Yes, absolutely. A world in which every finite set is small is perfectly consistent. This paradox is not so trivial as it might seem, if only because it is no paradox at all. In a proper paradox, two propositions must be in conflict. The set of all sets that are not members of themselves both is, and is not, a member of itself. This is Russell’s paradox. The conflict is plain, so such a set cannot exist.
Small sets are an irritant to intuition. In general set theory and model theory, ideals and filters are used to distinguish the small and large subsets of a ground set. The elements of an ideal are small. With filters, it is the other way around. An ultrafilter is a filter made maximal by inclusion. It includes everything that is large; all other subsets are small. The axiom of choice establishes the existence of nontrivial ultrafilters in every infinite ground set. These distinctions collapse in the case of finite sets, because any ultrafilter in a finite set is determined by a singleton.
In 1923, John von Neumann proposed to define the natural numbers by beginning with nothing. He began with the empty set Ø, and then argued that since nothing is nothing, 0 = Ø. His inductive definition of the natural numbers followed: n + 1 = n ∪ {n}. This implies that any natural number n coincides with the family of strictly smaller natural numbers. 1 is the oneelement set {Ø}, 2 is the twoelement set {Ø, {Ø}}, and so on up.
Finite sets may now be defined by means of the natural numbers. A set E is finite if there exists a onetoone correspondence between E and some natural number n. Finiteness is absolute and remains invariant in all reasonable mathematical models. If the natural numbers may be used to define the concept of a finite set, the argument goes in reverse. It was Wacław Sierpiński who showed that it was possible to define the finite sets without appealing to natural numbers. It is as simple as 1, 2, 3. A class K of sets is admissible if
 the empty set Ø belongs to K,
 every oneelement set belongs to K; and
 if X ∈ K and Y ∈ K, then X ∪ Y ∈ K.
A set Z is called finite if Z belongs to all admissible classes of sets. The natural numbers express the cardinalities of Sierpiński’s finite sets. This definition relies on the concept of a proper class of sets. This is not a concept defined in Zermelo–Fraenkel set theory. Come to think of it, how can one element sets be a part of a definition abjuring the natural numbers?
Kazimierz Kuratowski provided another definition of the finite sets. Consider a set X. Consider its power set P(X). Consider them together. The result is an upper semilattice (P(X), ∪). Now consider the upper subsemilattice (F(X), ∪) generated by Ø and all the singletons in X. According to Kuratowski, X is finite if and only if X ∈ F(X). The cardinalities of the finite sets are again just the natural numbers.
This theme admits still another variation. A set X is finite, Alfred Tarski argued, if every nonempty family of subsets of X has a maximal element with respect to inclusion. This definition emphasizes the fact that inclusion is a partial ordering. The cardinalities of these finite sets are, by now, fond familiars, and they can also be regarded as natural numbers.
The various definitions of the finite sets are equivalent. They lead to nothing new. A subset of a finite set is finite, the power set of a finite set is finite, the union of a finite family of finite sets is finite, the Cartesian product of a finite family of finite sets is finite.
Adieu the finite sets. What about those that are infinite? A set E is infinite if, for every natural number n, the cardinality of E differs from n. It differs, obviously, in that card(E) ≥ n, and not the other way around. This is the mathematician’s way of observing that an infinite set is bigger than any finite set. A set E is infinite, one might also say, if for an arbitrary mapping f : E → E, there exists a nonempty proper subset X of E such that f(X) ⊂ X. This is an abstract definition of an infinite set in terms of its selfmappings and invariant subsets, but, au fond, it is still another way of observing that Big is Bigger.
The Natural Numbers
Beyond being infinite, just what are the natural numbers, N? The set, N_{1}, of all natural numbers, von Neumann argued—what choice did he have?—is just
N_{1 }= {Ø, {Ø}, {Ø, {Ø}}, …, n, …}.
The concrete algebraic structure N_{1} = < N_{1}, Ø, s > encompasses N_{1}, the distinguished element Ø, and the algebraic operation s(n) = n + 1, for all n in N_{1}.
Algebra is one thing; a set of axioms, another. At the end of the nineteenth century, both Giuseppe Peano and Richard Dedekind provided an axiomatic account of the natural numbers. It is possible to streamline Peano just a little by beginning with an algebraic structure N_{2} = < N_{2}, 0, g>, where N_{2 }is the set of natural numbers, 0 is some distinguished element of N_{2}, and g : N_{2} → N_{2}. This is a little more lifelike than von Neumann’s construction inasmuch as the number 0 is there in plain sight. Peano’s original five axioms now collapse into three:
 g(x) differs from 0 for each x ∈ N_{2},
 g(x) = g(y) ⇒ x = y for any elements x and y in N_{2}; and
 if X is an arbitrary subset of N_{2} such that 0 ∈ X and x ∈ X implies g(x) ∈ X for each x ∈ N_{2}, then X = N_{2}.
The third item on this list is the axiom of induction, and it is worth noting that it cannot be expressed in strictly firstorder terms. A retreat to secondorder logic or set theory is mandatory, if there is to be any hope of defining the operations of addition and multiplication. It is reassuring that any structure N_{2} satisfying Peano’s axioms is isomorphic to N_{1}. It follows that these are structures in which addition and multiplication may both be defined, but the definitions require the recursion theorem, a settheoretical indemnification that is alien to arithmetic. Neither addition nor multiplication is purely arithmetic in its nature.
Hardly anything is.
Category theory offers the mathematician a farreaching but abstract approach to the natural numbers, the third of three. Samuel Eilenberg and Saunders Mac Lane introduced category theory to the mathematical world in the late 1940s, but it was William Lawvere who, in 1963, evoked its arithmetical aspect. Despite the importunities of his colleagues, Eilenberg indignantly rejected Lawvere’s ideas. Set theory and category theory are, in some sense, orthogonal. A set is comprised of its elements; a category, of its objects. These may be sets, groups, rings, monoids, vector spaces, or topological spaces. They may be almost anything at all. Category theory is an abstract account of objects and their morphisms: objects in general, morphisms subject to the law of association.
Existence before essence, as JeanPaul Sartre very properly observed.
Consider the category C of all objects (X, e_{X}, g_{X}), where X is a set, e_{X} is a distinguished element of X, and g_{X} is a mapping of X into itself. For any two objects (X, e_{X}, g_{X}) and (Y, e_{Y}, g_{Y}), a morphism from the first to the second is a mapping f : X → Y such that f(e_{X}) = e_{Y} and f (g_{X}(x)) = g_{Y }(f(x)), for each x in X. It is trivial to demonstrate that C satisfies the usual axioms of category theory. The scheme itself is sufficiently abstract that the identity of objects within C becomes relatively immaterial. It is rather like a theory of boxing in which the fighters are displaced almost entirely in favor of a record of their punches.
Categories may contain both initial and terminal objects. An initial object in a given category allows for precisely one morphism to every object in the category. The concept of a terminal object is dual. It goes the other way around. The empty set is the unique initial object in the category of sets. Is there an initial object in C? There is. It is N_{3} = <N_{3}, 0, g>, where N_{3} and N_{2} are twins, joined at the hip. For every object (X, e_{X}, g_{X}) in C there is a unique morphism f : (N_{3}, 0, g) → (X, e_{X}, g_{X}). From this it follows that N_{3} = (N_{3}, 0, g) is unique up to isomorphism.
What is N_{3} if not a model of the natural numbers, one emerging from the smooth but hyperborean expanse of category theory itself?
The foundations of arithmetic are a study in contrasts. Peano is stolidly foundational; von Neumann, palpably concrete; Lawvere, massively abstract.
Whether stolid, palpable, or massive, the structures that they describe are all the same.
Something from Nothing
In a lonely gesture of genius and defiance, Georg Cantor created set theory at the end of the nineteenth century. Naïve set theory makes use of the unconstrained concept of a set. There are sets without limit and sets without end. It quickly became obvious that naïve set theory was inconsistent. Russell’s paradox is the obvious example. This was not, and never has been, an impediment to its widespread adoption as a convenient tool; but in appealing to naïve set theory, mathematicians are today, at least, aware that they are doing a bad thing. Ernst Zermelo proposed an axiomatic version of set theory in 1908, but one that was obscure at a sensitive spot—the definition of a definite settheoretical property. By 1922, both Abraham Fraenkel and Thoralf Skolem had seen correctly how to make definite properties definite. The result is now known as Zermelo–Fraenkel set theory, ZF, or ZFC, when the axiom of choice is added to the axioms. There are eight axioms in all and one axiom schema.
The existence of the empty set follows from the axioms of ZFC, and so, too, the corollary that there is only one empty set. No one could endure two of them. A set X is called transitive if any element of X is a subset of X. Ø is a trivial example of a transitive set. An ordinal, α, is a transitive set whose elements are transitive, too. Ø is an ordinal for the same trivial reason that it is transitive. For any two ordinals α and β, the relation β ≤ α means that either β ∈ α or β = α. The restriction of ∈ to α × α is a (strict) wellordering denoted by <. It follows from the definition of the ordinals that every ordinal α can be written as α = {β : β < α}. If α is an ordinal, then α ∪ {α} is an ordinal and α < α ∪ {α}. The ordinal α ∪ {α} is the successor of α, and α is obviously the predecessor of α ∪ {α}. The successor of α is usually denoted by α + 1. An ordinal without a predecessor is called a limit ordinal.
We are among friends.
The BuraliForti paradox establishes that no matter its setlike aspect, the collection ON of all ordinals is no set at all. Suppose that it were. A contradiction follows in three steps and two unavoidable ministeps. Step one: every wellordered set X has a unique ordinal number isomorphic to X. Ministep: the set of ordinals ON is wellordered. Step two: the ordinal number of any segment S in ON is greater than any ordinal in S. Step three: ON has a unique ordinal number β. Ministep: β is in ON. It is an ordinal number, after all.
Der Untergang: since β is in ON, it follows that β < β, which is a contradiction.
If ON is not a set, then what? It is a proper class of sets, where the proper classes are setlike up to the very point that they are threatened by contradiction. Even though it is a proper class of sets, ON is remarkably useful in the work of generating all the sets of mathematics. All sets, von Neumann argued, are elements of the universe V. It is a universe generated by an ingenious transfinite process. The universal seed is just V_{0} = Ø. If V_{α} has already been defined at step α, where a is an ordinal, then V_{a}_{ + 1} is the family of all subsets of V_{a}. If α is a limit ordinal, then V_{α}= ∪ {V_{β} : β < α}. Finally, V = ∪ {V_{α} : α an ordinal}, and V is von Neumann’s universe of sets.
This construction encompasses, one might think, all of the sets that can be encountered in mathematical reasoning. Some reservations are necessary. Who knows what sets mathematicians are apt to encounter in the future? Whatever they may encounter, V seems a standing refutation of the maxim that nothing arises from nothing. Von Neumann’s world seed is the empty set, and what could be emptier than that?
Yet from Ø, there is V.
False Philosophy
An unbelievable amount of false philosophy, Bertrand Russell once affirmed, “has arisen through not realizing what ‘existence’ means.” In the case of V, it is hard to credit Russell. V means what it seems to mean. Every mathematical object that could exist does exist within V. Very few mathematicians have failed to recognize what this means. They have wondered whether it is true. Yet if considerations about V do Russell no favors, his strictures about existence are relevant elsewhere. Suppose that we are given a nonempty finite set Y and asked to find, at least, one element of Y. That there exists one element is trivial. Y ≠ Ø ⇔ (∃y)(y ∈ Y) is a matter of definition, and Y ≠ Ø by assumption.
Things are not so simple if what is wanted is a specific element y ∈ Y. From the fact that Y is nonempty, it does not follow that y must be forthcoming. This is a conclusion that follows from Gödel’s first incompleteness theorem. Any extension T of ZF must be incomplete if the set of its proofs in T is recursively decidable. There is always a statement S of T such that neither S nor its negation can be deduced within T.
Well, yes. Now, denote by R(y) the relation
((S ⇒ y = 0) & ((¬S) ⇒ y = 1)),
where y is free. From the definition of R(y), it follows that R(y) ⇒ y ∈ {0, 1}. By ZF, there is a concrete set E = {y : R(y)}. It is certainly true that E has a unique element e, but it is impossible to prove within T that e = 0 or that e = 1.
Specifying a concrete element in E is harder than it seems.
This example shows that in some cases strong logical assumptions are needed in order to prove weak properties of very concrete mathematical objects. In adopting such assumptions, we encounter objects with unusual, or extraordinary, or pathological properties. It often turns out that such pathologies dominate ordinary objects.
Cantor’s diagonalization argument provides an obvious example.^{1} A variant of the argument can be formulated in purely logical terms. Let S(x, y) be a binary relation between elements x and y of a fixed nonempty set X. S(x, y) generates the family {P_{y}(x) : y ∈ X} of one place relations on X, where each P_{y}(x) is defined by the equivalence P_{y}(x) ⇔ S(x, y). Now, introduce the one place relation Q(x) on X by stipulating that Q(x) ⇔ (¬S(x, x)). Then Q(x) differs from P_{y}(x), where y ranges over X. It follows that there exists no binary relation on X universal for all of its one place relations. This yields Cantor’s theorem: the cardinality of any set A is strictly less than the cardinality of the family of all subsets of A.
Zermelo suggested a radically different approach to the proof of the proposition that card(X) < card(P(X)). Exploiting transfinite induction without appealing to the axiom of choice, he demonstrated that no mapping G from P(X) into X can be injective. For every ordinal α, Zermelo defined the element x_{α} of X by letting x_{α} = G({x_{γ} : γ < α}). He then observed that in order to skirt the BuraliForti paradox, there must exist an element of X corresponding to two distinct ordinals. Now, if α denotes the least ordinal such that x_{α} = x_{β} for some ordinal β < α, then the sets A = {x_{γ} : γ < α} and B = {x_{γ} : γ < β} differ from one another, but G(A) = x_{α} = x_{β} = G(B), which shows that G is not an injection. The same argument shows that there is no injection from the family of all wellorderable subsets of X into X.
But there is more. König’s inequality affirms that
∑{a_{i} : i ∈ I} < ∏{b_{i} : i ∈ I}
for any two families {a_{i} : i ∈ I} and {b_{i} : i ∈ I} of cardinal numbers satisfying a_{i} < b_{i} (i ∈ I). Cantor’s theorem corresponds to the very special case when a_{i} = 1 and b_{i} = 2 for all i ∈ I. The proof of König’s inequality is similar to Cantor’s original proof, but it requires the axiom of choice.
Lorenz Halbeisen was able to strengthen Cantor’s theorem without the use of the axiom of choice, demonstrating that within ZF if X is an arbitrary infinite set, then card(Fin(X)) < card(P(X)), where Fin(X) denotes the family of all finite subsets of X.
Where there is a will, there is sometimes a way.
Local versus Global
For any nonempty set X, the collection of all bijective images of X forms a proper class. By defining this class as card(X), we leave the framework of ZFC. The concept of a universal set is very convenient in this respect.
A set U is universal if
 Ø ∈ U,
 if X ∈ U, then X ⊂ U;
 if X ∈ U, then P(X) ∈ U; and
 if R(x, y) is a binary relation functional with respect to y such that the domain of R belongs to U, and each element in its range to U, then the union of its range also belongs to U.
A universal set U may well serve as a model of von Neumann’s V. Every universal set is locally the entire universe of sets. The Tarski–Grothendieck axiom—Alfred Tarski, Alexander Grothendieck—now affirms that for every set Y, there exists a universal set U such that Y ∈ U. By virtue of this axiom, card(Y) can be defined as the set of all those bijective images of Y which belong to U.
The Tarski–Grothendieck axiom has a secondary identity. It is equivalent to the statement that the class of all strongly inaccessible ordinal numbers is not a set. This axiom also puts any set Y in some local universe U of sets, but leaves unanswered the question of just what happens outside of U? If the Tarski–Grothendieck axiom fails at this point, it succeeds at another. It makes possible a proof of the axiom of choice.
Early on, von Neumann addressed the problem of characterizing the sets among all classes. A class X is a set, he argued, if and only if there exists no termfunction which surjectively maps X onto V. If taken as an axiom, this statement restricts the size of all sets by comparing them to the unbelievably giant size of the proper classes. The existence of a global choice function now follows: there exists a termfunction F(X) such that, for any nonempty set X, the relation F(X) ∈ X holds true. The BuraliForti paradox suffices, since the class ON of all ordinals is not a set. Von Neumann’s axiom guarantees that there exists a surjective termfunction H whose domain is ON and whose range coincides with V. If a set X is nonempty, then the class H^{–1}(X) of ordinals is nonempty, too, and it, therefore, has a least ordinal α = α(X). So, F(X) = H(α(X)) is a global choice function on V \ {Ø}.
The reflection principle, due to Azriel Levy, expresses an important connection between V and its initial fragments. For any formula S(x) of ZF, there exists an ordinal β such that (∀x ∈ V_{β})[S(x) ⇔ S_{β}(x)], where S_{β}(x) denotes the relativization of S(x) to V_{β}. The reflection principle shows, in particular, that no property of a variable x can distinguish the universe V from all V_{α}, where α ranges over the ordinals. No formula of firstorder logic distinguishes the ancient and intuitive ideas of what is global and what is local. It also follows from the reflection principle that no consistent extension of ZF theory can be axiomatized with a finite list of axioms.
 Alexander Kharazishvili, “Cantor’s Diagonalization Method,” Inference: International Review of Science 2, no. 3 (2016). ↩
More From This Author

Cantor’s Diagonalization Method
On the significance and implications of Cantor’s discovery.
( Mathematics / Review Essay / Vol. 2, No. 3 )

Strangely Convergent Sequences
On the paradoxical math at the border of finite and infinite.
( Mathematics / Review Essay / Vol. 1, No. 4 )