Fiat recursio!

Lobina, David; Brenchley, Mark

Steven Weinberg has often noted the difficulties involved in explaining the physical sciences to the public. Those difficulties notwithstanding, physics seems generally well-rendered in the popular press. Not so the core linguistic ideas of Noam Chomsky. And yet, however technical his analyses may be, they are hardly more complex than those of the physicist.

Why the difference?

Emblematic of Chomsky’s popular difficulties has been his notion of recursion: more-or-less the idea that humans can do the same linguistic thing over and over again ad infinitum. So conceived, recursion was, only a few years back, the subject of considerable controversy thanks to two best-selling works by the linguist Daniel Everett, Don’t Sleep, There are Snakes and Language: The Cultural Tool, in which Everett claimed to have discovered a language that seemingly lacked this crucial feature.¹ We now find the recursion controversy prominently recycled in Tom Wolfe’s The Kingdom of Speech, along with the many articles this book has generated in recent months. Most of these publications suffer the same flaw: a misunderstanding of what Chomsky has claimed.

Still, for all its flaws, The Kingdom of Speech does at least have a characteristically striking tale to tell. Or, to be more precise, we should really say it has two of them, one dismissing Chomsky and the other, Charles Darwin. Both tales are apparently intended to prove Wolfe’s claim that language and evolution have nothing to do with each other, and that all of human civilization can be credited to the former. How seriously are we to take Wolfe’s grand narrative? It is hard to know. We simply note that it has been ably demolished elsewhere, and that there would be little point rehashing such criticisms here.² Instead, we focus on a very specific claim within the book that is increasingly important to current debates about language and cognition, but which remains marked by a good deal of conceptual confusion.

Wolfe tells a tale that goes something like this. In 2002, Chomsky, along with Marc Hauser and W. Tecumseh Fitch, published a landmark paper in Science.³ Among other things, this paper posited recursion as the distinctive feature of human language, mankind being uniquely blessed with the capacity to put one sentence inside another in a potentially infinite series. For Chomsky, this feature came close to expressing a natural law, one present in every possible human language.

Fiat recursio!

Except not every linguist was keen to toe the Chomskyan line. Everett journeyed deep into the Amazon and found the Pirahã, an exotic-seeming tribe whose language—mirabile dictu!—displayed no recursion whatsoever. And that was that for Chomskyan linguistics, whether it be his controversial belief that human languages exhibit universal properties, or his even more controversial belief that we are all born with innate knowledge of language. Everett had landed the knockout blow.

This is more or less Wolfe’s version of events. It is the popular version, too. This would be fine, except for one small problem: it is not true. And this isn’t just in the empirical sense that Everett’s claims regarding the absence of recursion in Pirahã remain open, something Everett acknowledges but Wolfe ignores. It is also not true in a conceptual sense. Recursion may well mean the capacity, more or less, to do the same thing over and over again. But, a more-or-less view of recursion does not quite cut it. To see why, we need to look at some of the technicalities involved.

In responding to the Science paper, scholars have generally taken recursion to mean the nesting of some piece of language of type X inside some other piece of language of type X, a phenomenon referred to as self-embedding. There are many different examples of this kind of self-embedding, found in just about every language under the sun. Perhaps the most commonly discussed, however, and the one used by Wolfe, is the nesting of one sentence inside another. Take the following sentence, for example: The woman was sad. The properties of English grammar are such that we may freely expand it by taking a similar sentence, such as The man was happy, and inserting it at the appropriate point: The woman was sad that the man was happy.⁴ Since we can do that, there is no grammatical reason why we cannot simply insert any number of additional sentences, thereby yielding The woman was sad that the man was happy that the girl was convinced and The woman was sad that the man was happy that the girl was convinced that the boy was frightened; and so on. Figure 1 is a graphic representation of the self-embedding in terms of the tree diagrams by which linguists represent grammatical structure, with each subsequent sentence, S, embedded within the previous S.

Figure 1.

A simplified syntactic representation for the sentence The woman was sad that the man was happy, where (…) indicates the possibility of further instances of self-embedding. The optional word “that” has been omitted from the tree for ease of exposition.

Linguists are quite fond of such tree diagrams. They are a useful way of representing the hierarchical nature of sentences whereby linguistic elements combine to form new units. This, Chomsky has long argued, is another universal property of language. The result is a nested structure of progressively interrelated units that exemplifies the core syntactic relationships within a sentence. In tree terms, such nestings are represented via lines that indicate which units are embedded inside which units, with the most deeply embedded elements towards the bottom and the least deeply embedded units towards the top.

Starting with the most deeply embedded elements of Figure 1, what we see is that an adjective (Adj) like sad can combine with a sentence (S) such as the man was happy to form an adjective phrase (AdjP) sad the man was happy. This AdjP can combine with the verb (V) was to form a verb phrase (VP) was sad the man was happy; and this VP can in turn combine with the noun phrase (NP) the woman to create the sentence (S) the woman was sad the man was happy. This tree also nicely captures the supposedly special nature of self-embedding whereby the grammar of a particular language allows for a particular unit, such as a sentence (S₁), to contain another such unit (S₂), which may in turn contain another such unit (S₃), in a potentially infinite sequence.

All reasonable enough. Critically, however, the Science paper never treats recursion and self-embedding as equivalent, and neither does Chomsky. Wolfe is simply mistaken when he states that Chomskyan recursion “consists … of putting one sentence, one thought, inside another in a series that, theoretically, could be endless.”⁵

Wolfe does not even have the basic history right, claiming 2002 as the year Chomsky announced recursion to the world. In reality, Chomsky can take credit for introducing his particular sense of recursion in the 1950s. He did so, moreover, in consonance with how it was then understood within the field of mathematical logic—a fact rarely noted—and he has been pretty consistent in his usage ever since.

What then is recursion for Chomsky and the logicians?⁶ Students of mathematics are often introduced to the idea via the recursive definition of the natural numbers:

0 is a natural number; and
if n is a natural number, then the successor of n (n + 1) is also a natural number.

Recursion is present in the statement’s form. Each natural number is defined in terms of the number that goes before, thereby allowing us to generate each number in turn, ad infinitum. This form is eventually showcased with a formula or an equation, with the same symbols on either side of an equal sign, thereby defining that symbol in terms of itself. Unfortunately, the recursive formula for the successor function is somewhat cumbersome for expository purposes. Instead, we can make do with the recursive definition of the factorials, where the factorial function f(n) represents the product of a number together with all the numbers that precede it:

f(1) = 1; and
f(n) = f(n – 1)n.

Crucially, despite both definitions being recursive, neither exhibits any form of self-embedding. A given natural number, say 3, is not self-embedded inside its successor, 4. Nor is the factorial of the number 3 self-embedded inside the factorial of 4. Each element is simply defined in terms of the element which precedes it. It is self-reference then which is key—not self-embedding.

Recursive definitions were very important in the attempts by logicians to formalize the notion of an algorithm during the 1930s and 40s. The concept of an algorithm is certainly centuries old, but it had always been understood in a rather intuitive way, namely as a set of rules for solving a problem in a finite number of steps. It was here that recursive definitions proved key, allowing mathematicians to express, with finite resources, a simple formula for producing a potentially infinite set of outputs.

Logicians of the time were also able to determine that recursive definitions and algorithms combine in interesting ways. Here we have another important distinction, this time between a recursive set and a recursively enumerable set. This is more straightforward than it sounds; these terms simply refer to two ways of using an algorithm to recursively define a set of elements. The first way is via an algorithm that tells you whether or not a given element is a member of a set. If such an algorithm exists, the set to which it applies is recursive. The second way is via an algorithm that can actually list all the various members of a set. If such an algorithm exists, such a set is said to be recursively enumerable.⁷

The young Chomsky was certainly exposed to these ideas, which seemed of relevance to the fact that human beings are able both to understand and to produce an unbounded number of sentences. No matter how many distinct sentences we have produced, we can always produce another; no matter how long the current sentence, we can always make it longer. Chomsky reasoned that our grammatical competence effectively licenses an infinite set of grammatically acceptable sentences. Since the human mind is clearly finite, it cannot be the case that such a sizeable set is stored in toto. He concluded that every human must possess some finite linguistic capacity for licensing an infinite set of sentences. Our linguistic capacity, Chomsky argued, must function like a recursively defined algorithm capable of enumerating all the possible sentences of a language. It is the linguist’s goal to develop an explicit, formal characterisation of this self-referential capacity—what Chomsky termed a generative grammar.

With this sense made clear, the inaccuracy of Wolfe’s version of events becomes apparent. Firstly, there is Chomsky’s core characterisation of the linguistic capacity in terms of a recursively-defined system as it was then understood within mathematical logic. Indeed, this is where his notion of generative grammar comes from, the word “generative” deriving from the work of Emil Post, one of the pioneering formalizers of recursively-defined techniques. In other words, we nowhere find Chomsky unpacking his notion of recursion in terms of the popular conception of self-embedding. Yet it is self-embedding to which Wolfe appeals when he talks of the “newly-discovered law of life on earth” that accounts “for man’s dominance among all the animals on the globe.”⁸

Secondly, contrary to what Wolfe imagines, Chomsky has insisted on this particular sense of recursion for more than 50 years. In 1963, Chomsky wrote that his grammatical theory expresses “the fact that … our [linguistic] process [is a] recursive specification”⁹ and, in 1968, that a “generative grammar recursively enumerates … sentences.”¹⁰ In 2000, he noted that our linguistic knowledge is “a procedure that enumerates an infinite class of expressions”¹¹ and, in 2014, that “we can think of recursion as [an] enumeration of a set of discrete objects by a computable finitary procedure.”¹²

Thirdly, what Chomsky is trying to capture is our special capacity for language. It is this capacity that he takes to be recursive and that makes possible all the languages that a human being can speak.

Combined, these three facts make clear why Wolfe’s attempt to marshal Everett’s claims about the supposed lack of recursion in Pirahã fails. It is simply irrelevant. For it takes Chomsky’s notion of recursion to be that of self-embedding; but these are distinct concepts. Wolfe and Everett are simply mistaken to claim otherwise. A generative grammar in Chomsky’s sense is a self-referential definition of the infinite set of grammatical sentences licensed by our capacity for language. That is just what it is. Whether human languages exhibit self-embedding is a separate question.

Moreover, Wolfe and Everett further conflate the properties of a particular language with the properties of the overall human capacity for language. Yet these are once again distinct things, and it is the latter which has interested Chomsky. So it may well be that a particular language, say Pirahã, does or does not display a certain feature, say the presence or absence of self-embedded sentences. That is an interesting question worth pursuing in its own right, but it is without import for Chomsky’s core arguments.

After all, we already know that the human capacity for language allows for self-embedding, because we already know of languages that clearly exhibit such a feature, English, for example. Hence, it is prima facie plausible that we are born with the capacity for self-embedding as part of our underlying linguistic knowledge. How else would a Pirahã baby just as competently converge on the grammar of English had English happened to be his language of birth? To claim otherwise would be a very peculiar form of nativism indeed. Chomsky’s fundamental goal has always been to determine the manner in which the various components of human language ability are purely linguistic, as opposed to being borrowed from other parts of the human mind, and the extent to which human beings are born with such linguistically unique features.

Why then should Chomsky’s claims be so easily misrepresented? Wolfe, Everett, and their various reviewers are far from alone in this mistake. The same confusion has been made by many linguists sympathetic towards Chomsky’s work. The likely source of the confusion can be found in the mathematical formulation Chomsky originally chose to capture the recursive nature of linguistic knowledge. Post’s so-called production systems were typified by a procedure in which one symbol, or a string, is converted into another symbol, or string. A string-manipulating system that, Post stated, “naturally lends itself to the generating of sets by the method of definition by induction,”¹³ with the term “definition by induction” simply the logician’s synonym for a recursive definition. Post’s system takes a string of symbols, call this string Φ, and expands it by one, that is, Φ + 1, in a fashion similar to the recursive definition of the natural numbers. Post used a rightwards arrow to mark how strings are converted into other strings: Φ → Φ + 1. It is this system that Chomsky and subsequent generations of grammarians used to capture the grammatical patterns of a language, formalizing, for example, the rule that an English sentence, S, may consist of a noun phrase, NP, such as the woman, and a verb phrase, VP, such as was sad: S → NP VP.

It was with this kind of formalism in mind that the post-Chomsky generation sought to capture the sentential self-embedding that was so clearly a feature of English grammar, and which we now find highlighted by Wolfe. This turned out to be remarkably easy, since all you needed were two rules, one that had an S on the left-hand side of the arrow, which served to kick-start the process, and one that had this same S on the right-hand side, to ensure that the first rule could also apply to the results of this second rule. To handle the kind of self-embedded sentence exemplified in Figure 1, for example, the following rule set would do nicely:

S → NP VP
VP → V AdjP
AdjP → Adj S

The first rule states that an English sentence can consist of a noun phrase and a verb phrase; the second, that a verb phrase can consist of a verb and an adjective phrase; and the third, that an adjective phrase can contain both an adjective and another sentence. These rules may then recursively combine, accounting for the fact that a sentence in English can contain an adjective phrase containing a sentence: the woman was sad that the man was happy.

It just so happened that the kinds of rule which so easily captured the sentential self-embedding of English also happened to be the clearest examples of concrete recursive rules as formulated within Post’s system. So it is perhaps unsurprising that recursion should have been so quickly conflated with the notion of self-embedding. But they are not the same thing. The kinds of rule just exemplified are instances of a recursive rule that can be formulated within Post’s system. They are made possible thanks to the underlying nature of the system. As such, there are two distinct senses in which linguists could think of Post’s formalism as recursive. In the first, it is the system itself which is inherently recursive. And in the second, recursion is a property of a specific rule fleshed out according to the constraints of the system. Many linguists have concentrated exclusively on the second instance, overlooking the role of recursion in the overall system. No one is free of fault here, but Chomsky has always been sensitive to this distinction. In the end, he was dissatisfied with Post production systems, replacing them with his own bespoke operation, which he termed Merge. Chomsky’s eventual decision to change formalisms was not some panicked reaction to the sort of devastating counter-evidence that might be found in the Brazilian jungle. On the contrary, it was a quite deliberate attempt to help resolve the sorts of confusions that Post’s system seemingly gave rise to, and which Wolfe exhibits in his book.

The small but critical technical flaw in Post production systems might have been of interest to Wolfe had he been sensitive to the context within which Chomsky introduced his concept of recursion. Post production systems cannot generate the sorts of hierarchical structures that typify human language. Technically speaking, they generate lists of symbols, little more than beads on a string. This was something that the early Chomskyans were aware of and glossed over by stipulating that strings could be mapped to structures. Unfortunately, stipulation easily becomes equivocation. One can easily end up talking of recursion and self-embedding rules within the same breath. The conflation of recursion as self-reference and self-embedding was thus preserved through the generations.

It is exactly these ambiguities that Merge addresses. Firstly, Merge is recursive in the sense that Post’s system is recursive. Indeed, Chomsky has always been keen to describe Merge as an operation that generates sets of elements in the same sense in which the natural numbers are recursively defined. Furthermore, Merge operates by directly embedding all kinds of linguistic material into all kinds of other linguistic material. There is no need for a string and structure stipulation. Unlike the original mathematical system that Chomsky borrowed, Merge actually builds the requisite hierarchical structure. The manner of its operation is also such that it does away with the distinction between the kinds of rule that generate self-embedded sentences and those that generate non-self-embedded sentences. At its core, all that Merge does is take two units, whether these be words or phrases or sentences, and combine them to make new, bigger units. The process is the same in every case.

Box 1.

merge [sad] and [the man was happy] =	[sad the man was happy]
merge [was] and [sad the man was happy] =	[was sad the man was happy]
merge [the woman] and [was sad the man was happy] =	[the woman was sad the man was happy]

The operations of Merge. It takes two units and combines them into bigger units.

Embedding and self-embedding amount to the same thing under Merge, with neither type of hierarchy any more special than the other and each just as much a part of our underlying capacity for language as any other. While all languages Merge, no particular language need self-embed.

In other words, it is not so much that Chomsky jettisoned his previous sense of recursion. Rather, he simply did away with a particular system that looked reasonable at the time, and replaced it with something that looks even more reasonable now. Unfortunately, both inside and outside the field, many simply didn’t notice and missed a step. The result has been a story of conflations culminating in Wolfe’s ill-advised foray into a topic he clearly knows little about.

While Chomsky’s claims may be somewhat technical, their essence is not hard to identify. In fact, they often lie in plain sight. At one point in The Kingdom of Speech, Wolfe recounts an encounter between Everett and Fitch, one of the co-authors of Chomsky’s Science article. The scene is deep in the Amazon, where Fitch has ventured in order to probe whether the Pirahã can actually do recursion when tested under experimental conditions. Says Fitch, understandably flustered by his new environs but suddenly finding a moment’s self-reflection, “it’s not recursion; I’ve got to stop saying that. I mean embedding.”¹⁴

Fiat recursio!

Letters to the Editors

Figure 1.

Box 1.

More from this Contributor

More on Linguistics