In the spring of 1960, Eugene Wigner delivered a lecture at New York University. Published as an essay the following year under the title “The Unreasonable Effectiveness of Mathematics in the Natural Sciences,”1 Wigner’s remarks sparked a debate that continues to the present day. Indeed, the significance and implications of the essay have been discussed far beyond the realms of mathematics and physics.
Wigner’s essay has long been a source of fascination for me. I was a graduate student when I first read the essay and I have returned to it many times over the intervening years. Although I have often found myself admiring the clarity and articulation of Wigner’s observations, it is the mystery he pointed to that first caught my imagination and is at the heart of its enduring appeal.
The mystery Wigner described can be stated as follows: mathematical concepts introduced for solving specific problems turn out to have unexpected and mysterious consequences in seemingly unrelated areas. This is a form of mathematical entanglement that both mathematicians and theoretical physicists are familiar with.
Wigner gives as an example the appearance of the number $\pi$, the ratio of the circumference of a circle to its diameter, in the Gaussian distribution formula,
\[f(x)=\frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}},\]
where $f(x)$ denotes the population density function.2 “Surely the population,” he goes on to add in the voice of a simple minded character, “has nothing to do with the circumference of [a] circle.”3
Later in the lecture, Wigner marvels at the fundamental importance of complex numbers in quantum mechanics,4 when, at its origin, the number $i=\sqrt{-1}$ was nothing more than a fictitious quantity introduced by some clever fifteenth-century Italian mathematicians to solve cubic algebraic equations.
Wigner also offers a surprising characterization of mathematics as “the science of skillful operations with concepts and rules invented just for this purpose.”5
The mystery Wigner points out arises in part from the perennial question of whether mathematics is a science, advanced by exploration and discovery like the main physical theories, or whether it is an invention—a creation of the human mind. Wigner, like many other natural scientists,6 seems to have adopted the latter point of view: underlying concepts of algebra and analysis were inventions made by great mathematicians.7 I will argue here that, on the contrary, although human genius was at play, the “science of skillful operations” developed naturally through exploration and discovery. In the opening section of this essay I will attempt to demonstrate how the fundamental concepts of algebra and calculus have indeed developed organically, starting with the operational rules of numbers and leading all the way to differential equations—the most fundamental mathematical concept relevant to all fields of physics. To do justice to this enormous task would require a whole book; this section may be viewed as a first sketch of one.
At the center of Wigner’s mystery is the question of why so many mathematical concepts turn out to play such a fundamental role in physics. To understand this further, one has to probe the nature of the relationship between mathematics and natural sciences, a task that I will undertake in the latter part of this essay. Mathematics starts with the indispensable concepts of numbers and shapes. Physics adds to these primordial building blocks the elusive concept of time, as well as various observables—such as weight, density, velocity, acceleration and so on—associated with natural processes. These observables can be measured, compared, and related to each other by performing careful experiments. Beginning with numbers and the need to operate using them, mathematics and physics thus share a common origin; a starting point from which they both advanced, often in pursuit of different goals, but still sharing an absolute need for consistency.
It so happens that consistency, in the case of all basic physical theories, is ensured by a rigorous mathematical framework. Physics is indeed, as Wigner intimates, inconceivable without mathematics.
But why does physics rely on the kind of mathematics that it does, developed by mathematicians in pursuit of problems that may have had little, if anything, to do with the issues faced by physicists at any given time? Could they not have developed their own mathematics, as it was needed?
I do not have a fully satisfactory solution to propose, but insofar that mathematics developed from numbers and shapes, it may not be surprising that the mathematical theories most relevant to physics are to be found among those that have developed precisely in that fashion.8
Mathematics, however, was not only useful to physics by that mysterious process of entanglement. Indeed, it has often enriched itself by being devoted to problems motivated by physics. Nobody has described this better than Henri Poincaré:
The combinations that can be formed with numbers and symbols are an infinite multitude. In this thicket how shall we choose those that are worthy of our attention? Shall we be guided only by whimsy? … [This] would undoubtedly carry us far from each other, and we would rapidly cease to understand each other. But that is only the minor side of the problem. Not only will physics perhaps prevent us from getting lost, but it will also protect us from a more fearsome danger of turning around forever in circles. History [shows that] physics has not only forced us to choose [from the multitude of problems which arise],but it has also imposed on us directions that would never have been dreamed of otherwise … What could be more useful!9
Thus, while advancing according to different goals and methodologies, mathematics and physics have often crossed paths and deeply influenced one another. Moreover, as I will argue in this essay, at a certain stage in their development, the natural sciences become themselves part of mathematics. As a result, they were developed by processes typical to mathematics, such as the quest for generalization, completeness, and rigor, the freedom to ask seemingly unrelated questions and make connections to other parts of mathematics, and the search for formal beauty.
All this does not crack Wigner’s mystery, but it places it, I think, in a different perspective.
Science of Skillful Operations
Mathematics is the science of skillful operations with concepts and rules invented just for this purpose. The principal emphasis is on the invention of concepts.
—Eugene Wigner10
After briefly reviewing some of the main stages in the development of algebra and analysis,11 I will argue that what Wigner refers to as the “invention of concepts and rules” is rather the invention of good mathematical notation followed by the discovery of concepts that often lie behind them and the extension of preexisting rules. Definitions, reflecting human choices made by great mathematicians, also play an important role in pinning down a new concept, as revealed in the process of understanding a difficult mathematical problem.12 But most importantly, the domain of mathematics followed a natural process of exploration, extension, and completion, starting with its basic operations: the addition and multiplication of natural numbers.
It is not my intention here to offer an exhaustive historical account describing how these developments occurred,13 but rather to demonstrate how they were driven by simple mathematical necessity. I will limit myself to the description of the main developmental steps that led to the concepts of equations, both algebraic and differential—crucial concepts in the formulation of all known physical theories. In doing so, I will necessarily have to neglect other crucial aspects concerning the relevance of mathematics to the physical world, such as probability theory, statistics, or the development of efficient methods of calculation.14
In the beginning, there were only the natural numbers and the human need to manipulate them to solve real world problems. The positive integers 1, 2, 3 … are the simplest and most intuitive mathematical objects. Among the elementary operations, addition and multiplication are also very intuitive, subtraction and division less so. The elementary operations are both commutative and associative, and, moreover, multiplication is distributive with respect to addition. These are the basic laws of arithmetic,15 akin, one might say, to the basic laws of a physical theory. These simple properties made it possible for the first mathematicians to devise ingenious algorithms for adding and multiplying large numbers.16 Subtraction and division, on the other hand, cannot always be performed within the framework of positive integers. To solve elementary word problems,17 the most obvious practical mathematical task of the time, early mathematicians had to perform these more difficult operations. This work led, in turn, to the introduction of the number zero,18 negative numbers, and fractions—that is, rational numbers.
The discovery of rational numbers, the first major accomplishment of mathematics, was achieved through the process of a simple algebraic extension. Namely, a process by which the original concept of numbers and the elementary operations have been extended to the simplest framework in which, with the exception of division by zero, all four basic operations can be performed. Moreover, and this is essential, the extended operations of addition and multiplication still verify the basic laws mentioned above. Thus, once the new numbers were introduced, mathematicians could operate with them by following the same familiar rules. In modern language, the set of rational numbers, $\mathbb{Q}$,19 together with the operations of addition and multiplication form a commutative division ring or field.20 The important thing here is that elementary word problems, formulated with numbers in $\mathbb{Q}$, are solvable in $\mathbb{Q}$.
The heart of the matter is the basic arithmetic rules: associativity, commutativity and distributivity (ACD). The universality of these rules made algebra possible, since the most natural way to express them is to invoke adequate symbolic notations, i.e., for arbitrary numbers $a$, $b$, $c$,
$a+(b+c) = (a+b)+c,$ $a+b=b+a$
$a\times (b\times c) = (a\times b)\times c,$ $a\times b=b\times a$
$a\times (b+c) = a\times b +a\times c.$
Note that denoting numbers by $a, b, c$ is not that different from denoting addition by $+$ and multiplication by $\times$, or $\cdot$, the unit for addition by $0$, $a\times a$ by $a^2$, or by the conventions concerning brackets. That is, they are all useful notational conventions.
Yet, and this is of fundamental importance, good notations often reveal important concepts behind it.21 In this case they happen to reveal the algebra of polynomials. Indeed, we can formally repeat the same operations with the same letters; thus deriving a multitude of formal identities, such as the magical binomial formula
$(a+b)^2= a^2 +2ab +b^2$
or
$(a+b)\times (a-b)= a^2-b^2.$22
Monomials are simple expressions involving only products such as
$a^2bc=a \times a \times b \times c$
and polynomials’ formal expressions involving sums of monomials, such as
$a^2+ a +1= 1\times a\times a + 1\times a +1.$
Two such formal expressions, polynomials, can also be summed or multiplied by making use of the ACD rules and one can easily check that these new operations with polynomials verify precisely the same ACD laws.23 As a result, we are able to manipulate abstract symbols as if they were numbers.
One is naturally led—although this was, once again, a lengthy historical process—to the class of all polynomials in $n$-variables, meaning all formal expressions obtained by first taking arbitrary products of the variables $(x_1, x_2\ldots, x_n)$ denoted by $x_1^{\alpha_1} \ldots x_n^{\alpha_n}$, known as monomials, and then arbitrary combinations of these monomials
- \[ P(x_1,\ldots x_n)=\sum A_{\alpha_1\ldots \alpha_n} x_1^{\alpha_1} \ldots x_n^{\alpha_n}. \]
where $\alpha_1, \alpha_2\ldots \alpha_n$ are arbitrary positive integers and $A_{\alpha_1\ldots \alpha_n}$ are also arbitrary numbers.24 The polynomial is said to be of degree $k$ if the sum in (1) extends for all indices $\alpha_1, \alpha_2\ldots \alpha_n$ with $|\alpha|=\alpha_1+\alpha_2+\ldots \alpha_n\le k$. Note that at this level of abstraction there no way to differentiate the variables $( x_1, x_2, \ldots, x_n)$ from the coefficients $A_{\alpha_1\ldots \alpha_n}$ in (2). The differentiation only becomes manifest when we interpret $P$ as a function in the variables $(x_1, x_2, \ldots, x_n)$.
A Mighty Triad: Polynomials, Functions, and Equations
Formal manipulations with abstract symbols, such as polynomials, are an essential part of algebra, but by no means the main thing. The more important breakthrough occurred when mathematicians recognized that any elementary word problem can be formulated as an equation, or as systems of equations, using variables $(x_1, x_2, \ldots, x_n)$ with numerical coefficients, and solved by cleverly manipulating the ACD rules. To associate equations to formal symbols requires, implicitly,25 the even more abstract concept of function.
Thus, for example, the formal polynomial $2 x +3$ becomes the function that sends any number $x$ to the number $2\times x +3$, to which we can associate the equation $2x+3=0$. More generally, to any polynomial $P(x_1,\ldots x_n)$, as in (1), we associate the function that takes any $n$-tuple of numbers $(x_1, x_2\ldots, x_n)$ to the number obtained by replacing the formal variables of $P$ with numbers of the $n$-tuple. Polynomials, as formal expressions, and their associated polynomial functions are, rightly, identified with each other but, as a consequence, we tend to forget what an enormous conceptual advance this was. Once the identification is done we can talk about the roots of the polynomial $P(x_1,\ldots x_n)=0$ with no hesitation.26 To hope to get a unique or rather a finite set of solutions,27 we need $n$ such polynomials, $P_1, P_2, \ldots P_n$, and the associated system of equations, with $m=n$,28
- \[P_1(x_1,\ldots, x_n)= P_2(x_1,\ldots, x_n)=\cdots =P_m(x_1,\ldots, x_n)=0.\]
A solution of the system is any $n$-tuple of numbers that simultaneously verifies all $m$ equations. Generally speaking, such equations are extremely difficult to solve, not just from a technical point of view, but also conceptually. In pursuit of this goal, mathematicians had to first revolutionize our understanding of numbers.29
There is, however, one very important case in which we can write down explicit solutions, even when we restrict the coefficients of our equations to rational numbers. This is the case when our polynomials are linear, i.e., the case when the corresponding monomials are just the formal variables $x_1,\ldots, x_n$,
$P_i(x_1,\ldots, x_n)=\sum_{j=1}^n a_{ij} x_j - c_i,$
where $a_{ij}$ and $b_i$ are given and fixed,numbers—say, in $\mathbb{Q}$. Linear algebra is nothing other than a systematic theory of how we can solve such equations,30 and as such it has an enormous range of applications. At this point, it is very tempting to make a detour and analyze the conceptual framework of this extraordinarily beautiful and important theory, which provides many more examples of how artful notation has led the way to the discovery of new and powerful abstract concepts such as matrices, determinants, matrix algebra, vector spaces, linear operators, eigenvalues, and so on. Such a detour would make this essay unreasonably lengthy so I will resist the temptation to delve any deeper here.
To summarize: The introduction of abstract symbols and the formal expressions that can be made with them, i.e., polynomials, have led, via their associated functions, to the fundamental concept of algebraic equations and systems. In turn, the study of linear algebraic systems has led, through the introduction of a remarkable number of skillful notations, to the beautiful, exotic, world of determinants, eigenvalues, matrix algebra, vector spaces and linear operators. All these provide brilliant examples of how notations, introduced first for formal convenience, turn out to reveal a higher form of mathematical reality.31 The powerful trio—formal expressions, functions and equations—represents an extraordinary conceptual breakthrough, possibly the most important one in the history of science,32 opening the way for an avalanche of other formal inventions, in particular and most importantly those that lie at the foundations of calculus.33
Beyond Rational Numbers
Rational numbers suffice to solve linear equations with rational coefficients, corresponding to the simplest possible word problems, i.e., linear. This is no longer the case with even the simplest nonlinear equations, such as $x^2- 2=0$, which has no solutions in the class of rational numbers. As well known this remarkable discovery, attributed to Pitagora long before algebra was “invented,” shook the greek mathematical community to its core.34 The solution of the problem, as we understand it today, requires the introduction of another clever process of algebraic extension, similar to the one of passing from positive integers to rationals. Introduce $\sqrt{2}$, first as nothing more than an abstract symbol, verifying the convention $\sqrt{2}\times \sqrt{2}=2$. Next, consider all symbolic numbers of the form $a+\sqrt{2} b$, denoted by $\mathbb{Q}[\sqrt{2}]$, and extend formally the operations of addition and multiplication by following exactly the same ACD rules, keeping track of the additional convention that $\sqrt{2}\times \sqrt{2}= 2$. More precisely,
$(a+\sqrt{2} b)+( c+\sqrt{2} d) = a+c+ \sqrt{2}(b+d),$
$(a+\sqrt{2} b)\times( c+\sqrt{2} d) = ac+ 2 bd+\sqrt{2} (ad +bc).$
One can easily check that the extended operations verify the same commutativity and distributivity laws as those for $\mathbb{Q}$. Thus, at formal level, calculations in $\mathbb{Q}[\sqrt{2}]$ are done precisely in the same manner as in $\mathbb{Q}$. Since no contradiction can arise in any calculations involving $\sqrt{2}$, we, modern mathematicians, have no problem to consider it “real.”35 In fact, further simple manipulations allow us to place it on the real line somewhere between the rationals 1.41 and 1.42, or even more precisely, between 1.4141 and 1.4143.
Although this formal procedure extends the notion of number, together with the ACD rules, forcing such equations to become solvable—all in a self-consistent manner—would not have satisfied Greek mathematicians. They only came to terms with the new numbers, which they named irrational, after they were able to make sense of them constructively. That is, by showing that they can be approximated by converging sequences of rational numbers. This procedure, attributed to Eudoxus, belongs properly to infinitesimal calculus.36 It was perfected much later by mathematicians working in the nineteenth century, such as Augustin-Louis Cauchy, Richard Dedekind and Georg Cantor, who refined it to include all real numbers. Today, the standard definition of a real number is based either on Dedekind cuts or, better in my view, the classes of equivalence of Cauchy sequences.37 Both procedures allow us to extend the operations of addition and multiplication, from $\mathbb{Q}$ to all real numbers, such that the same laws of arithmetic manipulations still hold true. The set of all real numbers thus obtained, denoted by $\mathbb{R}$, is a commutative division ring just like $\mathbb{Q}$. This defines a natural,38 completion of $\mathbb{Q}$ in which, in particular, any polynomial equation $P(x)=0$, where the associated function $P(x)$ takes both positive and negative values, must have a “real” solution.39
Complex Numbers
What about other quadratic equations, such as $x^2+1=0$? Can this be treated in the same way? Obviously not, since the square of any number, positive or negative is always positive. Yet some enterprising Italian mathematicians during the fifteenth century found it quite useful to introduce a fictitious number, denoted $i=\sqrt{-1}$, as well as all symbolic expressions of the form $a+i b$, with $a$, $b$ real numbers. Perfectly conscious that these expressions are “imaginary” they proceeded nevertheless to manipulate them as if they were real, by defining additions and multiplication based on the ACD rules together with the convention that every time $\sqrt{-1}$ is multiplied with itself we replace the product by $-1$, i.e., $i^2=i\times i=-1$. Thus,
$(a+\sqrt{-1}\, b)\times( c+\sqrt{-1} \, d) = ac- bd+\sqrt{-1} (ad +bc)$
$(a+\sqrt{-1} \, b)+( c+\sqrt{-1} \, d) = a+c+ \sqrt{-1}(b+d).$
Just as in the case of the irrational numbers $\mathbb{Q}[\sqrt{2}]$, this procedure defines the set of all complex numbers
$\mathbb{C}=\mathbb{R}[\sqrt{-1}] =\Big\{ a+i b/ a, b\in \mathbb{R}\Big\}.$
It is easy to check that these extended operations verify the ACD rules. Moreover, $\mathbb{C}$ is closed to divisions by non zero elements i.e., it is a commutative division ring. Note that though the extension procedure here is very similar to the one for $\mathbb{Q}[\sqrt{2}]$, there is a hugely consequential difference.Once $i=\sqrt{-1}$ has been introduced the equation $x^2=i$ can also be solved in $\mathbb{C}$ while the same is not true for the equation $x^2=\sqrt{2}$ which cannot be solved in $\mathbb{Q}[\sqrt{2}]$.40 This is a simple manifestation of an even more miraculous fact. It turns out that all polynomial equations with coefficients $a_0, a_1,...a_n$, of the form
$P(x)=a_0 + a_1 x+\cdots + a_n x^n =0$
are solvable in $\mathbb{C}$. This is the mighty fundamental theorem of algebra, first proved by Gauss.41 In the particular case of the general quadratic equation $a x^2+ bx +c =0$, the solutions can be written explicitly in the form
$x=\frac{- b\pm \sqrt{\Delta}}{2a},$ $\qquad \Delta= b^2 - 4 a c.$
In fact complex numbers were originally introduced just to keep track of such expressions when the quantity $\Delta$ is non-positive,42 and thus to give a unified description of all possible cases.
There are many reasons why the introduction of complex numbers was more revolutionary than that of irrationals. To start with there was no obvious reason for their introduction. For, unlike the case of irrational numbers which were introduced because of the need to solve simple geometric problems, there is, of course, no meaningful word problem which would lead one to solve the equation $x^2=-1$. The magical “number” $i$ started its existence as nothing more than a modest convention.
Moreover, unlike irrationals which can be approximated by converging sequences of rational numbers, there is no such procedure for complex numbers. As such, ancient Greek mathematicians, even those after Eudoxus, would have found their use unacceptable. This also applies to many skeptical European mathematicians prior to Gauss and before complex numbers were given a geometric interpretation,43 the so-called complex plane interpretation, discovered by Jean-Robert Argand and Gauss. Mathematicians are not always so keen to “skirt the impermissible” after all.
To summarize: The need to solve nonlinear algebraic equations has forced mathematicians to extend their understanding of numbers beyond the rationals. The real numbers $\mathbb{R}$ are derived from $\mathbb{Q}$ by a well defined process of mathematical completion.44 Yet the discovery of complex numbers $\mathbb{C}$ is entirely accidental. As such, it is the most striking example in the history of mathematics of a concept revealing notation. It also provides the ultimate example of an algebraic extension, that is, an extension of $\mathbb{R}$ and its operations, such that:
- The same, or very similar computational rules apply.45
- We can solve a much larger class of equations in the extended context, in this case all, non constant, polynomial equations of the form $P(x)=0$.
The Cartesian Revolution and Differential Calculus
The introduction of abstract notations and the discovery of functions and equations opened the way to other extraordinary advances. The first of these involved applying the new concepts of algebra to geometry. It was René Descartes who first realized that any basic geometric figure in the plane or space can be represented by algebraic equations or systems of equations, via his brilliant idea of introducing coordinates. At the same time, abstract algebraic equations could now be visualized by geometric figures. Thus, for example, the circle of radius 1, centered at origin, corresponds exactly to the set of point in the plane of coordinates $(x, y)$, which verify the equation $x^2+ y^2=1$. On the other hand any algebraic equation of the form
$ax^2+2 hxy+by^2 +2 gx +2f y +c=0.$
for any real coefficients $a, h, b, g, f, y$ represent a conic section, i.e., an ellipse, a parabola or hyperbola in the plane. In fact, any system of algebraic equations of the form $P_1=P_2=\ldots= P_m$ with $P$’s polynomials as in equation 1, and $m< n$ has a geometric representation as an $m$ dimensional geometric object in $\mathbb{R}^n$.46
The unification Descartes achieved between geometry and algebra, two separate branches of mathematics, each with their own histories, must rank as high as any other major scientific revolution. It led, in relatively short order, straight to differential calculus which, in turn, made possible the new science of dynamics—the beginning of modern science. The ability to go back and forth between algebraic and geometric concepts continues to play a fundamental role in contemporary mathematics as well as in physics.47
The transition between cartesian geometry and differential calculus was initiated by Fermat,48 who realized that once you could represent geometric figures by equations it was natural to also describe analytically the tangent direction to a curve at a given point, expressed in terms of the defining function of the curve.49 This leads straight to the definition of the derivative. It soon turned out that the same definition can also be used to define the instantaneous velocity of a particle whose position $x$ can be expressed as a function of time $x=f(t)$, that is,50 at $t=t_0$,
- \[f'(t_0) =\frac{d}{dt} f(t_0) =\lim_{t\to t_0} \frac{f(t)- f(t_0)}{t-t_0}.\]
The definition of derivatives of functions led to the second major avalanche of formal inventions in the history of mathematics, after the introduction of algebra. Differential calculus operates with abstract functions instead of abstract numbers, by following new and specific, rules. Thus, to be able to calculate efficiently derivatives of functions, it was useful to codify operations between functions in a similar way as done earlier with numbers. Elementary functions,51 like polynomials, can be added and multiplied, verifying ACD rules. In addition, elementary functions can be composed. That is, given two functions—$f, g$—one can define a third
$(f\circ g)(t)= f(g(t)).$
Differentiation introduces a fourth and crucial operation, which takes a function $f$ into its derivative $\frac{d}{dt} f$. There are three new simple laws connecting differentiation to addition, multiplication and composition. The simplest, the linearity property, for every real number is $\lambda, \mu$,
$\frac{d}{dt} (\lambda f+ \mu g) = \lambda \frac{d}{dt} f +\mu \frac{d}{dt} g.$
The rules involving multiplication and composition are
$\frac{d}{dt} (fg) = f \frac{d}{dt} g+ f \frac{d}{dt} f g\\ (f\circ g) ' = f'( g(t) ) g'(t).$
These laws, which are easy to deduce from the abstract definition of derivatives, are sufficient to calculate the derivative of any elementary function. Indeed, any complicated elementary function can be decomposed into simple pieces by addition, multiplication, and the composition of functions. Knowing how to differentiate the simple pieces, such as $t$ and $\sin t$, we can differentiate the more complicated functions, such as $\sin^2 t$ or $\sin(\sin t)$. We can also define higher derivatives of a function $f=f(t)$ recursively,
$\frac{d^{n+1}}{dt^{n+1}} f= \frac{d}{dt} \big(\frac{d^n}{dt^n} f\big).$
Also in Isaac Newton’s notation,
$f''=\frac{d^{2}}{dt^{2}} f$, $\quad f^{'''}=\frac{d^{3}}{dt^{3}} f$
Integral Calculus
Integration theory has its origin in the straightforward and perfectly natural question:52 given a function $f=f(t)$, find a function $u=u(t)$ whose derivative is given by $f$,
- \[\label{Integration}\frac{d}{dt} u= f.\]
The corresponding solution, $u$,53 is termed a primitive of $f$, denoted $\int f$. The rules of differentiation, mentioned above, have simple counterparts in rules of integrations. For example, the linearity rules of differentiation become
\[\int( \lambda f+ \mu g)= \lambda \int f +\mu \int g.\]
To find the integral of a given function $f$ one can try to implement a method similar to the ones used for calculating derivatives, i.e., try to decompose the $f$ into simple pieces whose primitive we know how to compute. But this is just the formal aspect of integration theory. The major breakthrough, made independently by Newton and Gottfried Wilhelm Leibniz,54 was the realization that one can use this formal inverse derivative operation to calculate areas of complicated geometric figures.55 Thus, as the story has it,56 was begotten the glorious era of calculus!
The relation between derivatives and integrals is less transparent for functions involving more variables. The integration theory of such general functions was perfected in the nineteenth and early twentieth century by mathematicians like Bernhard Riemann and Henri Lebesgue. It was later extended into the more abstract framework of measure theory with a vast number of applications,57 particularly in probability theory.
The Second Triad: Functions, Differential Operators and Differential Equations
A differential expression applied to a function $u=u(t)$ is a formal expression involving addition, multiplication, and derivatives of $u$ in the form
- \[P[u]= a_0 u + a_1 \frac{d}{dt} u+ a\frac{d^2}{dt^2} u +\cdots + a_m \frac{d^n}{dt^n} u,\]
where, in the simplest case, $a_0, a_1, \ldots, a_m$ are either given constants or functions of $t$. In both cases, the operator $P$ is linear, i.e., for any constants $\lambda$, $\mu$, such that
$P[\lambda u+\mu v]= \lambda P[u] +\mu P[v].$
In the general and nonlinear case, the coefficients may also depend on $u$ and its derivatives. Note that $P$ is an operator, or functional, i.e., it operates on functions and produces functions. The corresponding differential equation attached to the operator $P$ is the differential equation $P[u] =0$,58 whose solutions are functions.
To be more precise, solving the polynomial equation $P(x)=0$ meant finding a number $x$ such that $P$, understood as a function, vanishes when evaluated at $x$. To solve the differential equation $P[u]=0$ means, instead, to find a function $u$ such that $P$, understood as an operator, vanishes identically when evaluated at the function $u$. Though the words to describe the two situations are similar, the difference in terms of the potential applications is enormous. Indeed, functions $u=u(t)$ can be used to describe the paths of a particle in motion, while the operator $P[u]$ is a mathematical representation of a given law of motion. The first derivative $u'(t)$ can then be interpreted as instantaneous velocity and the second derivative $u''(t)$ as instantaneous acceleration. Given a physical law, prescribed by the second order operator $P$,59 the solutions $u$ of $P[u]=0$ are all possible trajectories of particles within the physical process governed by the law $P$. Thus, unlike algebraic equations $P(x)=0$, in one variable $x$, which cannot have more solutions than the degree of the corresponding polynomial $P$, the differential equation $P[u]=0$ associated to the differential operator (5) has infinitely many solutions. We know from experience that the trajectories of particles in Newtonian mechanics depend only on their original positions and velocities.60 This has a simple and very general mathematical formulation in terms of what is known as the Cauchy problem. More precisely, under very reasonable smoothness assumptions on the defining function $P=P(u)$ in (5), solutions of the equation of $P[u]=0$ are uniquely specified by the values of $u$ and its first $m-1$ derivatives at a fixed value of the parameter $t$. Since physical laws are given by second order operators we see indeed that physical intuition in this case aligns perfectly with the mathematical properties of second order differential equations.
The formalism can be extended to systems of particles described by vector functions $u(t):=\big((u^1(t), u^2(t),\ldots, u^n(t)\big)$. The corresponding physical law, describing interactions between the $m$ particles, will be then represented by operators $P_1(u),P_2(u),\ldots, P_n(u)$ and the trajectories followed by each particle are solutions of the system of differential equations
- \[P_1[u]=P_2[u]=\ldots = P_n[u] =0.\]
In the particular case when the orders of all $P_i(u)$, $1\le i\le n$, is one, we can rewrite the system,61 under some simple non-degeneracy condition, in the more convenient form
- \[\frac{d\textbf{u}}{dt} = f(\textbf{u}, t),\]
with $\textbf{u}=( u^1,\ldots u^n)$ and $f:\mathbb{R}^n\rightarrow \mathbb{R}^n$. To solve the initial value problem for (7) means to specify the value of the vector $\textbf{u}$ at some value $t=t_0$. Typical systems are autonomous, i.e., $f=f(\textbf{u})$.
In the linear case, when $f$ is a linear function, i.e., $f(\textbf{u})=A\textbf{u}$ with $A$ an $n\times n$ matrix, the system takes the form
- \[\frac{d\textbf{u}}{dt} = A\textbf{u}.\]
To solve the initial value problem for (8) means to find solutions $\textbf{u}$ such that $\textbf{u}(t_0)=\textbf{u}_0$ for an arbitrary vector $\textbf{u}_0\in \mathbb{R}^n$.
In the simplest s case, when $n=1$, and $A$ is the constant $\lambda$, the equation
- \[\frac{d u}{dt} = \lambda u, \qquad u(0)=u_0,\]
can be easily solved by the exponential function $u=u_0e^{\lambda t}$. More generally, the vector $\textbf{u}= \textbf{u}_0 e^{\lambda t }$ is a solution of the linear system (8) if and only if $\textbf{u}_0$ is an eigenvector of the matrix $A$ with eigenvector $\lambda$, i.e.,
$A\textbf{u}_0= \lambda \textbf{u}_0.$
With a little bit more work one can solve the general initial value problem for (8). In fact, that problem reduces to the problem of finding all eigenfunctions of $A$ and the corresponding eigenvalues,62 that is a problem of linear algebra.
The simplest example of a second order differential equations describes the motion of the linear harmonic oscillator,63
- \[\frac{d^2}{dt^2} u+ \omega^2 u=0.\]
Looking for solution of the form $u= e^{\lambda t}$ we find that $\lambda^2+\omega^2=0$, a result that may seem discouraging because we are looking for real solution. Yet this provides another powerful example of the usefulness of complex numbers. We find the complex solutions $e^{i \omega t}$ and $e^{-i\omega t}$. A general solution can then be written in the form
$u(t)= a e^{i \omega t}+ b e^{-i \omega t}.$
If the initial data (10) are $u(0)= u_0, \partial_t u(u)= u_1$, with $u_0, u_1$ reals, we can solve the system in $a, b$,
$u_0= a + b, \qquad u_1= ia \omega-i b\omega$
and, using the Euler’s formula $e^{i\omega t}= \cos(\omega t)+ i\sin(\omega t)$, we find
$u(t) = \cos( t\omega) u_0+ \omega^{-1} \sin(t\omega) u_1.$
As was already the case for algebraic equations, nonlinear equations are rarely solvable in terms of specific formulas.64 This has forced the development of a powerful qualitative theory which allows us to infer various properties of solutions in the absence of specific representations. All qualitative studies of solutions start with the fundamental theorem of ODEs, according to which, for very broad assumptions on $f$ and any initial data $u_0$, there exists a sufficiently small $\epsilon > 0$ and a unique solution $u : [t_0, t_0 + \epsilon)\rightarrow \mathbb{R}^n$ of the system (7) verifying $u(t_0) = u_0$.65
Complex Differentiation
Once we know what it means to take the formal derivative of a real function $f(t)$, i.e., a function defined from an interval $I\subset \mathbb{R}$ with values in $\mathbb{R}$, denoted by $f:I\longrightarrow \mathbb{R}$, it makes sense, by pure analogy, to ask if we can perform a similar operation for a complex function $f(z)$, i.e., a function defined from a domain $D\subset \mathbb{C}$ with values in $\mathbb{C}$, $f:D\longrightarrow \mathbb{C}$. We try to mimic the definition (3) as follows:
- \[f'(z_0)=\lim_{z\to z_0} \frac{f(z)- f(z_0)}{z-z_0}.\]
Unlike the case of real functions, for which the limit makes sense whenever $f$ is smooth enough,66 this is absolutely not the case here. The limit makes sense for simple polynomial functions in $z=x+iy$, such as $z^n$, but not for other perfectly smooth functions, such as polynomials in $\overline{z}= x-iy$. Functions $f:D\longrightarrow \mathbb{C}$ for which the limit makes sense at all points of $D$ are called holomorphic. These functions are the object of study in one of the most beautiful and consequential branches of mathematics: complex analysis.
If we decompose a holomorphic function $f$ into its real and imaginary parts, i.e, $f=u+iv$, and interpret both $u$ and $v$ as functions in the variables $x, y$, we find that they must satisfy the following compatibility conditions:67
- \[\frac{\partial}{\partial x } u =\frac{\partial}{\partial y} v, \qquad \frac{\partial}{\partial y} u =-\frac{\partial}{\partial x} v.\]
These are the so called Cauchy-Riemann (CR) equations, the simplest classical system of partial differential equations.68 The restrictions imposed by the CR equations lead to an extraordinary number of remarkable properties.
- The CR equations are not only linear, i.e., linear combinations of solutions are themselves solutions, but they also have the remarkable properties that the product and composition of two CR maps, i.e., maps $(x, y)\longrightarrow (u, v)$, is again a CR map.69
- CR maps are conformal invariant, i.e., they preserve the angle between planar curves. Moreover, according to the celebrated Riemann mapping theorem,70 any simple connected domain in $\mathbb{R}^2$,71 different from $\mathbb{R}^2$, can be mapped by a CR map into the unit sphere of $\mathbb{R}^2$.
- Both components of a CR map are harmonic, i.e., they verify the Laplace equation,
- \[\Delta u=\Delta v=0, \qquad \Delta= \frac{\partial^2}{(\partial x )^2}+ \frac{\partial^2}{(\partial y )^2}.\]
- The most useful property of holomorphic functions is the powerful Cauchy formula which relates the value of an holomorphic function at a point to the integral along a closed curved around the point.73
What is remarkable about the theory of complex holomorphic functions, initiated by Cauchy in the first few decades of the nineteenth century,74 is that, unlike differential calculus, which was intimately tied to the contemporaneous development of physics, the work in this area began as a mathematical enterprise driven by pure mathematical considerations: intellectual curiosity, analogies, careful analysis. That the theory turned out to have such an extraordinary range of applications, including modern physics, is another brilliant illustration of the miracle described by Wigner.
Partial Differential Equations
In the same way that polynomials can be extended from one variable to many, differential equations can be extended to functions depending on many variables, $u=u(x^1, \ldots, x^n)$. Given such a function we can take partial derivatives $\partial_i u =\frac{\partial}{\partial x^i} u$ obtained by differentiating $u$ with respect to the $x^i$ variable while keeping all the others fixed. A higher order mixed partial derivative of $u$ can be written in the form $\partial^\alpha u=\partial_1^{\alpha_1}\partial_2^{\alpha_2}\cdots \partial_n^{\alpha_n} u$ with $\alpha=(\alpha_1, \alpha_2,\ldots \alpha_n)$,75 where $|\alpha|=\alpha_1+\alpha_2+\cdots \alpha_n$ reflects the order of differentiation. Denoting by $\Lambda^k u$ the set of all partial derivatives of $u$ on the order of $\le k$, a general partial differential expression takes the form
- \[P[u]= A\big(x,\Lambda^k u(x)\big),\]
where $A$ is a given specified function. We associate to the formal expression the corresponding differential operator $u\rightarrow P[u]$ and the partial differential equation
- \[P[u]=0.\]
The equation is said to be linear if the operator $ P[u]$ is linear, i.e.,
$P[\lambda u +\mu v]=\lambda P[u]+\mu P[v].$
These operators $P$ can then be written in the form
- \[P[u] = \sum_{|\alpha|\le k } a_{\alpha} \partial^\alpha u,\]
where the coefficients $a_{\alpha}$ are functions of the variables $x=(x_1,\ldots, x_n)$. One can study scalar equations, such as (14), or systems of equations where $u$ is itself a vector $u=(u_1, \ldots u_k)$ verifying the multiple equations
$P_1[u]=P_2[u]=\ldots P_m[u]=0.$
There is very little one can say about partial differential equations (PDEs) at this level of generality. Unlike ordinary differential equations (ODEs), for which we have a satisfactory general local existence and uniqueness result, no such result is known in the context of general PDEs.76 Even in the summary style of this essay it is impossible to describe anything remotely substantial about the enormous range of PDEs.77 I will restrict myself here to a few brief remarks.
- While ODEs provide the right mathematical framework for describing the motion of point particles, PDEs provide the perfect language to describe the motion of continuum of particles, or the more modern physics concept of fields. Thus the fundamental equations in continuum mechanics, electrodynamics, and general relativity are all PDEs
- The only general class of equations for which one can develop a sufficiently general theory are linear equations with constant coefficients (LCCE), i.e., equations of the form (16) for which the coefficients $a_\alpha$ are constant. In that case, the Fourier transform method, first developed by Joseph Fourier at the beginning of nineteenth century during his studies of the heat equation, provides a very powerful and general tool. But even in that case, developing a general theory is prohibitively complicated, cumbersome, and not particularly illuminating.78 It is a lot more useful to concentrate instead on special equations using either Fourier methods or other methods based on the construction of a fundamental solution. Indeed, among the LCCE class there are a few equations that are ubiquitous throughout mathematics and physics, and disproportionately important in classifying and understanding larger classes of equations.79 Thus, for example, the Laplace operator
$\Delta =\partial_1^ 2 +\partial_2^2 +\partial_3^2$
is typical to the class of elliptic equations, while the D’Alembertian operator
$\square=-\partial_0^2+\partial_1^ 2 +\partial_2^2 +\partial_3^2$
is typical to wave equations. Other examples of second order scalar equations are the Heat operator $\mathscr{H}=-\partial_t+\Delta$ and the Schrödinger operator ${\mathcal S}=-i\partial_t+\Delta$. At the level of systems of equations, the Cauchy-Riemann equations, see (12),
$\partial_2 u_1+ \partial_1 u_2=0,$ $\partial_1 u_1-\partial_2 u_2=0,$
and the Maxwell equations are by far the most conspicuous. - The range of relevance for specific PDEs is phenomenal. Indeed, specific PDEs are at the heart of fully-fledged areas of physics. Thus one can argue that hydrodynamics is defined as the body of results, both theoretical and experimental, concerning the Navier-Stokes and incompressible Euler equations. In the same spirit, electrodynamics deals with the Maxwell equations and general relativity is really the study of the Einstein field equations, a geometric PDE by excellence. Similar remarks can be made about magneto-hydrodynamics and non-relativistic quantum mechanics. Moreover, entire fields of mathematics such as complex analysis, several complex variables, minimal surfaces, harmonic maps, connections on principal bundles, Kähler and Einstein geometry, and geometric flows are also organized around specific PDEs or classes of PDEs.
- Since very few PDEs can be explicitly solved, mathematicians have been forced to develop indirect methods that allow them to describe the most important properties associated with solutions of the important equations.80 Thus, even though a developing a meaningful general theory is a pipe dream, mathematicians have been able to develop an impressive body of methods and techniques which are applicable to various equations.81
- While the range of all possible equations is enormous, only a few special ones appear in physics. It is remarkable that the most important such equations can be derived using another unreasonably effective formal procedure known as the variational principle.82
- Despite the enormous progress made with PDEs during the last two centuries, there remain a large number of fundamental problems for which our understanding is very limited. The problem of turbulence, as it manifests itself in the simplest mathematical context of the Navier-Stokes equations, or the cosmic censorship conjecture in general relativity are but two of the most conspicuous examples.
One of the most impressive mathematical results of the last hundred years is the solution to the Poincaré conjecture using Hamilton’s Ricci heat flow.83 This is a huge achievement and one in which the modern theory of PDEs played a crucial role. It is a dramatic demonstration of Wigner’s reflections on how ideas originating in specific areas of mathematics or physics percolate in other seemingly unrelated areas.84 The mystery can be stated as follows: how does a heat flow, originating in Joseph Fourier’s study of heat conduction, have anything to do with the Poincaré conjecture about the topological properties of the 3-dimensional sphere? The main results are due to Grigori Perelman, but his achievement ought to be viewed as the culmination of the immense progress made during the last century with elliptic and parabolic PDEs. The introduction of the Ricci flow itself,85 and the first important results based on it are due to Richard Hamilton. The geometrization conjecture that put the Poincaré conjecture in a full classification setting of 3-compact manifolds was introduced by William Thurston. The development of techniques for dealing with nonlinear parabolic and elliptic equations in order to analyze general solutions of the Ricci flow is due to great mathematicians such as Sergei Bernstein, Ennio de Giorgi, David Hilbert, Eberhard Hopf, Jürgen Moser, John Nash, Louis Nirenberg, Aleksei Pogorelov, Poincaré, Riemann, Juliusz Schauder, Sergei Sobolev, Hermann Weyl, and many others throughout the last century. The more recent blending of Riemannian geometry with PDEs was pioneered by mathematicians such as Thierry Aubin, Richard Schoen, Karen Uhlenbeck, and Shing-Tung Yau.
Manifolds and Tensor Calculus
In our subject of differential geometry, where you talk about manifolds, one difficulty is that the geometry is described by coordinates, but the coordinates do not have meaning. They are allowed to undergo transformation. And in order to handle this kind of situation, an important tool is the so-called tensor analysis, or Ricci calculus, which was new to mathematicians.
—S. S. Chern86
As part of the earlier discussion about the revolutionary fusion between geometry—with its lines, circles, and triangles—and algebra—with its abstract equations—the crucial contribution of Descartes was noted. Namely, his insight that geometric figures could be described by equations and vice versa. This is true, but there are, in fact, many ways to describe a given geometric object by equations. Depending on where a system of cartesian coordinates is centered, the standard sphere of radius 1, denoted $\mathbb{S}^2$, can be expressed by the equation
$x^2+ y^2 + z^2 =1,$
as well as
$(x-x_0)^2+ (y-y_0)^2 + (z-z_0)^2 =1.$
But there are also non-cartesian systems of coordinates, such as polar coordinates,
$z=r\cos \theta, \quad x=r \sin\theta cos\varphi, \quad z=r \sin\theta \sin\varphi,$
in which case the sphere becomes simply $r=1$. The problem becomes far more acute if the usual calculus is extended to functions on the sphere. If $f$ is such a function, $f:\mathbb{S}^2\longrightarrow \mathbb{R}$, how are derivatives defined along $\mathbb{S}^2$? For this purpose it is necessary to parametrize the sphere. For example, near its north pole
$N=(x=0, y=0, z=1),$
we can use the parametric equations
$x=u, \quad y-v,\quad z=\sqrt{1-u^2-v^2}.$
The function $f$ on $\mathbb{S}^2$ can then be described as the composition
$(u, v) \longrightarrow f(u, v, \sqrt{1-u^2-v^2}),$
which can then be differentiated with respect to $u$, $v$ as many times as needed. The problem is that the same function can be represented in many other ways depending on which parametrization is used. Thus, for example, a polar coordinate $\theta$, $\varphi$ could also have been used, in which case the function $f$ would be represented by
$(\theta, \varphi)\longrightarrow f( \sin\theta cos\varphi, \sin\theta \sin\varphi, \cos\theta).$
These are just two possible parametric representations of this function, but there are, in fact, infinitely many possible parametric representations of the sphere. Thus there are also infinitely many possible representations of the function and infinitely many ways to differentiate it. The problem is that the result of differentiation depends heavily on which parametrization is chosen and it is cumbersome to pass from one expression to another. As there is no a priori reason to prefer one over another, how should a parametrization be chosen? Is there a good definition of the derivatives of $f$ that makes it possible to pass easily from one parametrization to another?
Tensorial calculus was developed by mathematicians precisely in order to solve this problem. A proper definition of tensorial quantities starts first with an abstract definition of a manifold, completely removed from any visualization, as is the case of the sphere.87 The concept, first envisioned by Riemann,88 is founded on the notion that smooth geometric objects can be described locally purely in terms of local parametrizations, also known as local coordinates, and transformation maps between them. Tensor-fields on a manifold are quantities that also transform according to simple transformation laws. To define a good notion of the differentiation of tensors, which transform as tensors, it is necessary to endow the manifold with an additional structure called a connection—an innovation introduced much later by Weyl. At the time of his own work, however, Riemann knew nothing of it.
In order to define a notion of distance between points, Riemann also endowed his manifold with a metric, which turns out to be itself a tensorial quantity, i.e., it transforms with respect to coordinate transformations. Starting with a given metric, he was then able to generalize the notion of curvature for embedded 2-surfaces, which were discovered by Gauss as part of his famous investigations of the geometry of surfaces.89 This turns out to be the most important and non-trivial tensorial quantity—the mighty Riemann Curvature tensor.90 The metric also defines a unique and compatible connection, the Levi-Civitta connection, with respect to which full-fledged tensorial calculus on a Riemannian manifold can then be performed. It is important to note that the curvature tensor depends on two derivatives of the metric, while the Levi-Civitta connection, which is not a tensorial quantity itself, depends on one derivative of the metric. Moreover, the curvature tensor has a simple expression in terms of the connection and its first derivatives.
The curvature tensor is at the heart of Albert Einstein’s theory of general relativity, but not exactly the way Riemann defined it. In attempting to generalize Gauss’s theory of surfaces, Riemann naturally assumed his metric to be positive definite, while Einstein’s theory deals with so-called Lorentzian metrics.91 The passage from the Riemannian case to the Lorentzian goes through Hermann Minkowski, who was able to reformulate special relativity in terms of precisely such a metric, the simplest, known as the Minkowski metric. It turns out that special relativity can be described in its entirety by the Minkowski metric together with a version of tensor calculus restricted to the linear change of coordinates which preserve the metric, i.e., Lorentz transformations. General relativity is an extension of special relativity where Minkowski space is replaced by a general Lorentzian manifold, Lorentz transformations are replaced by arbitrary changes of coordinates, and all physical relevant quantities are tensor-fields. This latter statement is the mathematical embodiment of Einstein’s equivalence principle.92 Finally, the relation between the metric and various matter fields acting on the manifold is expressed in terms of an equation, namely the Einstein field equations:
$G = 8\pi T$.
The tensor $G$ on the left, which Einstein referred to as being made of marble, depends on only the metric and its curvature, while $T$, the so-called energy momentum tensor of matter, depends on the particular type of matter carried by the spacetime. Einstein refers to the right hand side as being made of wood, i.e., reflecting our contingent and imperfect understanding of it. Remarkably—yet another miracle—the Einstein field equations can themselves be derived by a variational principle.
Conclusions for this Section
Formal manipulation with abstract symbols led to the first fundamental triplet: an algebra of formal expressions, functions, and algebraic equations. In the same manner, formal manipulation with functions, including derivation, leads to the second fundamental triplet: differential calculus, differential operators and differential equations—including PDEs. It is important to note that both developments were intrinsic to mathematics, that is, they followed the inner logic of formal mathematical processes.93 The need to extend differential calculus and differential calculus to manifolds has also led, by a similar process, to tensorial calculus which has had a profound impact on modern physics. The fact that ODEs, respectively PDEs, and their modern tensorial reformulations, turned out to provide the perfect language for Newtonian mechanics, respectively the right formalism for continuum mechanics, electrodynamics and general relativity is, of course, part and parcel of Wigner’s mystery.
Limits of the Permissible
The great mathematician fully, almost ruthlessly, exploits the domain of permissible reasoning and skirts the impermissible. That his recklessness does not lead him into a morass of contradictions is a miracle in itself: certainly it is hard to believe that our reasoning power was brought, by Darwin’s process of natural selection, to the perfection which it seems to possess.
—Eugene Wigner94
As can be seen from the exposition in the earlier sections of this essay, the history of mathematics offers plenty of examples where, in the pursuit of specific problems, mathematicians are forced to transcend what is permissible by extending the objects and rules with which they operate.
The expansion of the concept of numbers from positive integers to rational, real, and complex numbers is an obvious example. An even more dramatic example occurred in connection to differential equations. To start with, even to define the derivative of a function is not at all obvious, since it requires taking a limit of fractions where the denominator converges to zero—meaning that one has to make sense of division by 0, a seemingly impermissible task. One has to understand what it means to take a limit—that is, making sense of an infinite process95—and give precise definitions of intuitive notions, such as continuity and various degrees of smoothness for functions.96
The need to clarify these issues was spurred by applications, yet the way mathematicians dealt with them is typical of the inner workings of mathematics: precise definitions, simple examples, generalizations, analogies, symmetry considerations, and a quest for completeness. That is, the search for the broadest setting in which various operations make sense. A striking example of the process of completeness, as we have seen, is the development of real and complex numbers. In order to solve specific examples of polynomial equations, mathematicians were led to the modern concepts of real and complex numbers.
A similar development has occurred in the theory of functions. To understand the broadest setting for which integration and differentiations makes sense, mathematicians developed Lebesgue integration theory, distributions, and various function spaces,97 such as the conspicuous Sobolev spaces.98 Most importantly, this led to the ability to formulate precise and general notions of solutions to differential equations. This is a truly remarkable development because, as was already apparent for algebraic equations,99 only a very small number of equations can be solved explicitly. To be able to show that solutions exist, without the ability to determine them explicitly in terms of elementary functions, is one of the most important achievements of mathematics during the last two centuries.
The driving idea behind this landmark development was that, despite the inability to explicitly represent solutions, it should still be possible to describe their essential properties. The starting point of such a process is the development of precise notions of general solutions, as mentioned above. Once such solutions are shown to exist, by an elaborated convergence process, one can then extract specific qualitative features of the solutions such as uniqueness, continuous dependence on initial conditions, smoothness, specific bounds, asymptotic behavior, the presence of singularities, and so on.
Wigner’s Great Mystery and Notions of Reality
The first point is that the enormous usefulness of mathematics in the natural sciences is something bordering on the mysterious and that there is no rational explanation for it.
—Eugene Wigner100
As I wrote in the introduction, I am not aware of a fully rational explanation for Wigner’s mystery. Moreover, I am doubtful that one can ever be found at all. It is, after all, a mystery as great as any of the other great puzzles confronting us, such as the fine tuning of our universe, the origins of life, or why there is something rather than nothing.101 Wigner’s mystery is at the heart of what we are and how we interact with the external world. The best one can hope for is to shed some light on the “how” rather than the “why.”
If one postulates that physical reality exists as an all-encompassing reality, perceptible through our senses, but independent of them,102 the mystery pointed out by Wigner not only lacks a credible rational explanation, but seems entirely incomprehensible. Indeed, according to this viewpoint, mathematics deals with abstractions of the mind, which are nothing but a function of the brain—that is to say, a highly organized and evolved part of the same physical reality we are trying to describe. Yet if the brain is a natural evolutionary machine, as claimed by some modern neurobiologists,103 how can these mathematical abstractions, developed within it by purely physiological processes, attain that marvelous effectiveness described by Wigner in his lecture? “It is hard to believe,” he observes, “that our reasoning power was brought, by Charles Darwin’s process of natural selection, to the perfection which it seems to possess.”104 How, he marvels, is the natural selection mechanism capable of explaining the amazing ability of mathematicians to weave together thousands of abstract logical steps without falling into a “morass of contradictions”?105 In connection with the surprising presence of complex numbers at the heart of quantum mechanics,106 he writes:
It is difficult to avoid the impression that a miracle confronts us here, quite comparable in its striking nature to the miracle that the human mind can string a thousand arguments together without getting itself into contradictions, or to the two miracles of laws of nature and of the human mind’s capacity to divine them.107
Can that “natural evolutionary machine” concept of the mind explain all this? Can it explain why, as has often happened in the history of physics, abstractions pursued by mathematicians for seemingly esoteric reasons happen to be exactly what is needed for a new theory of the natural world?
There are other issues with the materialistic viewpoint. Not least among them is the fact that whatever we mean by the term physical reality is intimately tied to our own experiences of space-time and causality. As general relativity made clear, these are concepts that can only be described precisely through mathematics. But more about this later.
Physical Reality
There is another view, brilliantly illustrated by Plato in his allegory of the cave, according to which physical reality is itself simply a shadow of a non-physical reality, which can only be accessed by the mind. Plato’s insight is largely dismissed nowadays, both as a matter of principle in favor of materialism, but also because it leads to an artificial proliferation of rather vague ideal forms, such as that of a chair or a house, or highly elusive ones, such as justice or beauty.
Yet it is hard not to take Plato’s ideas seriously when it comes to mathematics. In a famous passage in his Republic, he points to the fact that mathematical objects, such as the circle, seem to have an objective reality independent of our own.108 We associate objectivity to physical things and processes first, because of our senses. We see a glass of wine, we can touch it, smell and taste the wine within it. We can either drink the wine or hear the sound made by breaking the glass. We can also share our experience with friends and not be at all surprised that they have impressions identical to ours. We can also leave the glass on the table and discover a day later that it is still in the same place. It is this rigorous coherence and consistency between the various ways we experience the glass that gives it its sense of objectivity, that is reality.
It makes sense to attempt to define the objectivity, or reality, of a physical object or process by the coherence and consistency of all our sensory experiences,109 including the exchange of impressions with other people. But what about things we consider real, but which cannot be experienced directly through our senses, such as viruses and bacteria, or stars invisible to the naked eye? Those can be brought within the same definition of reality with the help of instruments, such as a microscope or telescope, which vastly enhance our senses. But it is difficult to extend this definition to even smaller things like atoms, electrons, or quarks, for which microscopes are powerless, or massive things like black holes, which are intrinsically not directly observable.110
To account for their reality we need to extend our notion of objectivity by making a huge stretch. We consider them objective not because they are directly coherent and consistent with our senses, which they are not, but because they have a measurable effect, through an observation or experiment, within the framework of an accepted physical theory. These measurable effects may be quite remote from our senses; they need only be consistent, through logical inference, with all other known facts of the theory. But this is not enough. An acceptable physical theory must not only be consistent with all the measurable effects alluded to above, as well as all previously accepted physical facts, but also with itself—that is, with its entire logical framework. This is a mighty task and one that only mathematics is able to accomplish.111 Mathematics indeed provides an unambiguous and highly efficient language to tie together our various physical experiences into coherent physical laws and use them to make precise measurable predictions which can then be confronted by experiments.112
In his lecture, Wigner offers two examples of the fundamental role played by mathematics in the formulation of physical law—planetary motion and quantum mechanics. Special and general relativity provide equally striking examples. Geometry, that is Euclidean geometry, is itself the first known example of a physical theory. But this is not all. Not only are physical theories formulated in the language of mathematics, even more remarkable is the fact that new physical theories are almost always first designed in the laboratory of mathematics to explain facts unaccounted for by old theories and to make unexpected predictions which can then be tested experimentally.113 String theorists even argue, or at least have argued in the past, that the mathematical difficulties involved in reconciling quantum mechanics with general relativity are so formidable that only a unique theory—that elusive theory of everything—would be able to accomplish this feat.114 The obvious and paradoxical corollary of such a statement is that the mathematical design of the theory may suffice without immediate regard for experimental verification. A few generations earlier, Einstein offered the equally striking observation that new physics will have to wait for revolutionary new progress in mathematics.115
Mathematical Reality
What about mathematics? Plato argues that mathematical objects, such as the circle, are not only themselves real, but that they are in fact more real than those we experience by our senses. For anyone other than a mathematician or a physicist this may seem hard to swallow. Is not the circle just an unsubstantiated idealization of the real circles of our natural world? Plato argues, however, that it is the ideal circle that has true reality and that those we deem real are, in fact, but their imperfect embodiments. Leaving aside this claim of a more perfect reality, are mathematical objects real in the same way as our glass of wine? If one insists that reality has to be defined as coherence and consistency of sensory experiences the answer is no.116 But this definition is much too restrictive for a meaningful understanding of the physical world. If one accepts, however, the broader point of view that the reality of an object is defined by the consistency of our experiences with it, whether physical or mental, then mathematical objects, such as the circle, have a powerful claim to reality. Remarkably, one can spend years studying various properties of the circle, together with other geometrical concepts such as points, lines, triangles, ellipses and parabolas, and never arrive at a contradiction. Or one can try, in ignorance, to prove a false statement about these or other more abstract mathematical objects such as groups, manifolds, differential equations, and so on—only to realize that the incredible resistance encountered is harder than that of any rock. Or the extraordinary sense of satisfaction experienced when two completely different calculations arrive at the same result. Though people usually disagree on almost any issue not directly verifiable by their senses, a theorem like that of Pythagoras, proved more than two thousand years ago, is still recognized as valid today by anybody who cares to go through its proof. For those of us who have dedicated enough time to the pursuit of mathematics there is no doubt that mathematics deals with real, self-consistent, objects, that are imperceptible to the senses, but comprehensible to the mind.
An argument has been made above that an object is physically real only if it leads to observable and measurable effects consistent with all the other facts of an acceptable physical theory. This is, of course, a contingent definition; physical theories may change as new observable facts are brought to light. Mathematical reality, on the other hand, has only to be consistent within itself—that is, within the realm of its own definitions, concepts and theorems. Now, and this may bring us closer to the essence of Wigner’s mystery, the acceptable physical theory, needed as a crucial ingredient to describe physical reality, is itself a mathematical object—that is, an object which has mathematical coherence, hence reality, independent of its relevance to physics.117 One can study classical geometry, celestial mechanics, and quantum mechanics or relativity as pure mathematical theories, without the need, if one so wishes,118 for any regard to its applications to physics.119 Moreover, while physical reality is naturally constrained by our intuitive representations of space, time, and causality, the mathematical world is free of any such considerations. Not only are mathematical objects causally unrelated, the very concept of spacetime is itself a mathematical object, or rather objects. Mathematics is able to unambiguously define and study not only one, but various versions of spacetimes, of which only one can claim physical reality. It is this freedom, removed from our innate intuition of it through the sensory world, that made possible the revolutionary reinterpretation of spacetime in special and general relativity. While this intuition led Newton and Immanuel Kant to postulate an absolute notion of space and time, independent of the physical objects it contains, the new relativistic understanding makes spacetime another physical object in active interaction with all other physical objects within it. This radical change of view would have been inconceivable without the mathematician’s ability to freely play with concepts and theories as objects of the mathematical world. Quantum mechanics offers an even more radical departure from sensory based physical intuition. The duality between waves and particles, the uncertainty principle or entanglement, are incomprehensible outside the mathematical framework of the theory. Thus, in the words of Werner Heisenberg, “the smallest units of matter are not physical objects in the ordinary sense; they are forms, ideas.”120
Conclusions for this Section
Advances in theoretical physics are easier to fathom if we give up on any transcendental notions of reality,121 such as that of an eternal material world, independent of our perceptions of it and according to which, the human mind is but one of its manifestations. This point of view is ultimately circular and like the ether in pre-relativistic physics, it is more cumbersome than helpful. If, instead, reality is to be defined by the consistency and coherence of experiences, physical or mental, then mathematical objects have an equal or even better claim to it. While our senses can be illusory, logic applied to well-defined mathematical objects is infallible. Moreover, the physical objects of modern physics, such as electrons, quarks, strings or black holes are themselves mathematical objects impossible to fathom outside their natural mathematical framework.122 Though it is prudent to keep insisting on a fundamental distinction between physical and mathematical reality, it is hard not to notice that the more advanced a physical theory is, the more elusive this distinction becomes. And here, maybe, lies in a more outrageous form, the crux of the mystery; are these two so distinct after all?
In his remarkable book The Road to Reality, Roger Penrose gives an interesting illustration of the mysterious and paradoxical relations between the three worlds:123 the Platonic realm of mathematical forms, the physical, and the mental. A first mystery, the one pointed out by Wigner, is, in Penrose’s account, the fact that the physical world is entirely “illuminated” by a portion of the mathematical one.124 The second mystery is that the mental world is itself entirely “explained,” or determined, by a portion of the physical one, while the third holds that the mathematical world is entirely accessible to a limited portion of the mental one.
By anchoring the definition of reality to objectivity, meaning coherence and consistency of representations both sensory and mental,125 I find myself strongly in favor of the notion that mathematics is a science,126 in that it deals with mathematical discoveries rather than inventions,127 or creations of the human mind,128 by following its own version of the scientific method. Is there, however, a role for human creativity in mathematics? Of course there is. As is the case in any other science, faith in a certain outcome, the determination and persistence to pursue it, as well as the ability to change course when facts prove one wrong, are also part and parcel of mathematical research. But there is more, something unique to mathematics. Poincaré described it as “the feeling of mathematical beauty, of the harmony of numbers and forms, of geometric elegance.”129 It is, he added, “a genuinely aesthetic feeling which all mathematicians know.” According to Poincaré, it is this aesthetic sensibility that guides the mathematician to make inspired choices when faced with myriads of possible avenues in solving a problem.130 Similar aesthetic considerations are also at play when one chooses which problems to work on in the first place. Mathematics is thus both science and art; truth and beauty joined together in the most successful and inspiring saga of the human spirit.131
Closing Remarks
Modern physics leads to a conception of reality in which objectivity, measured by the consistency of our representations of it, is the ultimate arbiter. In that sense, mathematical objects are no less real than physical ones, although we still make an essential distinction. Physics starts with the raw notion of reality based on our direct experience of it, through our senses, and proceeds to extend its domain by incorporating any observable and measurable effects consistent with an accepted mathematical framework. At times, when new observations or experiments are found to be inconsistent with one or some of the laws, it reformulates them by adopting a new mathematical framework. Incompatibilities between theories used to describe different domains of physical reality,132 such as that between quantum mechanics and general relativity, are also powerful drivers in the pursuit of new mathematical theories in which the incompatibilities may be resolved. Mathematics, on the other hand, is only constrained by logical consistency.133 Its various branches are never incompatible with each others.134 This gives mathematics an enormous amount of freedom to explore and develop in many possible directions.
Upon closer inspection, however, mathematics does not deal with random abstract concepts, but has in fact begun its development from the most primitive notions of numbers and shapes. Starting with numbers and the practical need to manipulate them, mathematicians were able to extract the simple ACD laws of addition and multiplication. As I have argued in this essay, algebra begins with a progressive awareness of these laws and the extraordinary convenience of expressing them using simple abstract symbols. A related development occurred in geometry.135 Though initially very different disciplines, algebra and geometry were brought together when Descartes and others realized that all the elementary shapes of geometry can be described using algebraic equations. It was this momentous discovery that made calculus, with its unlimited number of applications, possible. Once the notion of derivatives was introduced as a formal expression for the tangent to a curve, mathematics followed a similar pattern of discovery as the development of algebra leading to the second formal triad mentioned in this essay, of differential calculus, differential operators and differential equations.136 The principal focus of both triads is on equations; algebraic in the first case, differential in the second. I am not too far from the truth, I think, in saying that solving equations, algebraic, differential, and otherwise,137 is the primary business of mathematics.138 Solving equations is crucial to all applications of mathematics; essentially all word problems occurring in engineering, the physical sciences, statistics, biology, and economics can all be translated into equations. And, of course, in classical physics,139 the basic laws are nothing but differential equations.140 It thus seems appropriate to update Pythagoras’s simple organizing belief that “All is number,” or Galileo’s “All is Geometry,”141 to the post-Newtonian “All is Equation.”
Acknowledgement: I am grateful to David Berlinski for his patient reading of a previous version of the text, constructive criticism, and numerous suggestions. I am also grateful to the editors of Inference for their assistance in preparing the essay.