A Dozen Years of Misunderstanding

Daniel Everett

To the editors:

No one seems to understand anyone anymore. I do not understand Avram Noam Chomsky. And Chomsky and his supporters do not understand me. “You do not understand me” is a common (and often valid, of course) claim in science. It is rife in the literature on the recursion debate that Tom Wolfe discusses in his book, The Kingdom of Speech. I first noticed the prevalence of misunderstanding in 1981 when I was working on my PhD in Brazil. At the time I was sharing my office for four months with University of California at Berkeley philosopher John Searle, who was a visiting professor in our department at the Universidade Estadual de Campinas. I was reading a passage in Chomsky’s 1980 book Rules and Representations in which Chomsky criticizes Searle at length. I turned to John and asked “John, can I get your reaction to a section of Chomsky’s book where he criticizes you?” “Sure,” Searle replied. I read it. I was a firm follower of Chomsky at the time (I introduced Chomsky’s previous theory, Government and Binding, to Brazil in my PhD dissertation, including original translations of the technical terms of that theory that gained wide currency in Brazil). After reading the passage, I asked John for his reaction. “Well, Dan,” John said with a grin,

I agree with Noam! The things he attributed to me make no sense and he is correct to criticize them. Of course, I never said any such thing. I think Noam knows that too. You see, Dan, Noam and I have an understanding: I never understand anything he writes and he never understands anything I write.

Every criticism about me over the past twelve years, since my ire-inducing article on the general effect of culture on grammar in human languages (in which mention of recursion was minimal) has been built on one of the three following ideas: Everett is wrong about Pirahã; Everett is lying about Pirahã; or Everett does not understand Chomsky and therefore Everett’s criticisms are irrelevant. “Fiat recursio!” focuses on the third idea. But David Lobina and Mark Brenchley (L&B) not only misunderstand me, ironically, they also misunderstand Chomsky. And these misunderstandings lead to a greater misunderstanding of the debate they are trying to comment on.

In their review of Tom Wolfe’s Kingdom of Speech, the authors present a clear, concise and seemingly convincing response to Wolfe, arguing that Wolfe (and I) have misrepresented or failed to understand Chomsky’s work on recursion. For example, they reject soundly the idea that recursion is simply building tree structures, attributing this idea to me, to Wolfe, and to others who have caricatured or failed to comprehend Chomsky’s views over the years. They argue that Chomsky has worked on recursion not since 2002, but since the 1950s. Moreover, they claim that Wolfe and I have not only misunderstood what Chomsky means by recursion, but that we also fail to distinguish between having a capacity and displaying that capacity. That is, in their view of the issue, Chomsky never claimed that all languages have recursion. He claimed instead that all humans have the capacity for recursion.

Therefore, they claim, Wolfe and I misunderstand Chomsky, in spite of the fact that for those who really understand recursion what he meant is clear. Wolfe and I have failed to distinguish capacity from implementation of capacity.

Unfortunately, L&B’s depiction of the issues and the debate—of Chomsky’s views versus Wolfe’s and of my misunderstandings is false. It is also unoriginal. Many people have raised the same points in defense of Chomsky. In order to defend Chomsky, it seems that one must misrepresent his critics. In my experience, the most common way this is done is to claim that his critics are making a category mistake, thinking, for example, that a single phenotype (what a given language is like) contradicts the broader human genotype (the capacity of the language or its speakers). This would indeed be a fundamental error, as I take pains to point out in my forthcoming book, How Language Began: The Story of Humanity’s Greatest Invention. Fortunately, this error is not committed by me and not by Wolfe, who based his claims on my work.

In this response, I am going to point out several misunderstandings in L&B’s review. The first of their misunderstandings is that Chomsky’s special preoccupation with recursion did not begin in the 1950s, but indeed, just as Wolfe has it, with the Hauser, Chomsky, Fitch (HCF) paper published by the journal Science in 2002.¹ Second, by recursion Chomsky means a binary-branching, endocentric operation constructing embedded structures of exactly the kind that the authors say he does not mean, Merge. I will not repeat the definition of Merge here. Not only have I discussed it at length in several technical publications, but the authors themselves give an accurate definition of it in their review, though failing to emphasize that it builds embedded structures naturally and though such structures are not themselves crucial, they are the very kinds of examples that HCF use in their 2002 paper to illustrate recursion. Next, I am going to explain again why Pirahã is exactly the counterexample that the authors claim it is not. Finally, I am going to assert that Chomsky’s claims on recursion lack empirical significance.

Following the publication of my 2005 paper in the journal Current Anthropology, in which I mentioned the absence of recursion in Pirahã, I have had about twelve years of constant criticisms of exactly the kind that L&B raise. The first way that I addressed this was to organize in 2007 the first-ever international conference on recursion, co-sponsored by the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany and Illinois State University, where I was then employed. The conference was interesting on various levels but in particular because of the many versions of recursion presented by mathematicians, computer scientists, psychologists, and linguists.² The presenters, however disparate their own views might have been on formal definitions and the appearance of recursive structures in human languages, agreed that when HCF discussed recursion in their 2002 article, they meant Merge—a fact I have discussed at length in many papers. Moreover, the claim of HCF was that the capacity for recursion seemed fundamental because languages appear to build structures recursively (exactly as I described in my 2009 article, “Pirahã culture and grammar: a reply to some critics,” in Language, the journal of the Linguistic Society of America, and in my 2012 article, “What does Pirahã Have to Teach Us About Human Language and the Mind?,” among many others).³

To see where L&B have gone astray, we need briefly to review the history of Chomsky’s work in syntax, since they claim that his concern with recursion begins in the 1950s, then get to Chomsky’s 2002 claim with his colleagues W. Tecumseh Fitch and Marc Hauser.

Chomsky was a student of the linguist Zellig Harris, himself heavily influenced by both Leonard Bloomfield and Edward Sapir. Sapir and Bloomfield were also strong influences on Chomsky. Harris and Bloomfield were interested in the mathematical foundations of linguistics, while Sapir was interested in the cultural and mental aspects of language. To Sapir (and here he influences me as well), both linguistics and psychology were sub-branches of anthropology. However, Chomsky seems to have been influenced by Sapir’s view that the mind is a legitimate object of study (quite novel for American psychologists in the 1950s). He kept his focus on structure and formal properties of language, ignoring Sapir’s focus on culture. Chomsky was influenced most by Harris, borrowing from Harris concepts that were to turn out very important in his own theoretical work, such as “transformations” and a version of the theory of phrase structures that came to be known as “X-bar theory.”

Because of Chomsky’s focus on formal and mathematical concepts and limitations of structure in natural languages, he has indeed, as the authors of the review point out, long been concerned with recursion, among many other formal properties of languages. For him it was an important component of the syntax of human languages—so much so that from his very first days, whenever Chomsky mentions language, he means syntax (see, among many others, Chris Knight’s recent book, Decoding Chomsky). But never during any of those times, even as he wrote in his ambitious tome, Logical Structure of Linguistic Theory, did he claim that recursion was the single capacity that made human language possible. Nor did recursion in earlier years take on the particular form of Merge that was subsequently claimed by Chomsky (p.c.) to be at the center of the 2002 article and most of Chomsky’s linguistics work since 1995.

In 1995, with his book, The Minimalist Program, Chomsky’s view of recursion shifted radically. From this point on, he was concerned with a special case of recursion, Merge, and he desires to make this the basic operation of human language. This was never the case in earlier versions of his theory. Here is what is crucial regarding the recursion claims he made with Fitch and Hauser in 2002:

We hypothesize that FLN only includes recursion and is the only uniquely human component of the faculty of language.

We assume, putting aside the precise mechanisms, that a key component of FLN is a computational system (narrow syntax) that generates internal representations and maps them into the sensory-motor interface by the phonological system, and into the conceptual-intentional interface by the (formal) semantic system; adopting alternatives that have been proposed would not materially modify the ensuing discussion. All approaches agree that a core property of FLN is recursion, attributed to narrow syntax in the conception just outlined. FLN. At a minimum, then, FLN includes the capacity of recursion.

In fact, we propose in this hypothesis that FLN comprises only the core computational mechanisms of recursion as they appear in narrow syntax and the mappings to the interfaces. If FLN is indeed this restricted, this hypothesis has the interesting effect of nullifying the argument from design, and thus rendering the status of FLN as an adaptation open to question.⁴

The last paragraph is interesting because it serves as the basis for Chomsky’s bizarre claim that human language cannot be explained by natural selection but that it popped into being via a Prometheus that was born with a gene or set of genes that provided a capacity for recursion. All of this is refuted in detail in How Language Began. It is mentioned here because Chomsky’s work on recursion is vital for his larger program of understanding language origins as not subject to Darwinian theory. What is crucial here is not a continuation of the use of recursion (which Chomsky believes that all languages have, also assumed by all structuralist theories of language preceding Chomsky’s, such as in the work of Leonard Bloomfield, Kenneth L. Pike, Zellig Harris, Rulon Wells, and many others). It is Chomsky’s focus on Merge, not his long-standing interest in recursion that sets him apart from other linguists.

Interestingly, when interviewed on camera for the documentary about my research among the Pirahãs, “The Grammar of Happiness” (also known as “The Amazon Code”), Chomsky claimed baldly that “There is no question that the language is built on a recursive process.”⁵ In other words, he did not claim that my work was irrelevant, but that it was unbelievable; he prefaced his remarks by saying “There is also a question whether any of it is true [referring to my work], but that’s another matter.” Thus in one answer he was able to stress that I might or not be a liar, but was certainly wrong. He did not say I had misunderstood him. Because in long correspondence with me over the issues, he knew I had not. The reason that this is crucial here is because it reveals that it is meaningless to claim that there can be a capacity for language that need not be used in any actual language.

Just to emphasize, however, Pirahã does appear to lack recursion (whether the general version or the Merge version). There is no strong evidence that recursion plays any role in the grammar of Pirahã (although I have argued in detail that the Pirahãs do use recursion in the construction of their stories and conversation, not expressed in the grammar but in the way they keep track of themes cognitively). More recently, a careful examination of Pirahã texts by a team from the Department of Brain and Cognitive Sciences at MIT has concluded that, based on the available data, not only does Pirahã seem to lack recursion, but that the data examined are perhaps best described via a linear, non-recursive grammar.⁶ In any case, let us turn to consider the significance of Chomsky’s claim that the Narrow Faculty of Language (FLN) is a capacity for recursion.

Never in my writing have I been unaware of the fact that Chomsky’s claim is about a capacity. In fact, to the contrary, I have always been careful to offer evidence that the Pirahãs do indeed have the capacity for recursion. But not only does this not weaken my argument, it strengthens it.

As I have repeated in many outlets, refereed and popular, if recursion is merely a component of human intelligence generally, i.e. a capacity, even if it were the fact of human intelligence, this would not establish a direct tie with human languages, as is claimed in the idea of the Narrow Faculty of Language. It is just a component of human intelligence. In this view, one that I have always espoused—had the authors read more of my technical work—as Wolfe in fact did—they would have seen that my work is not directed at the fact that a single language lacks recursion, but that because a language lacks recursion, there can be no empirically meaningful claim that there is a language-specific capacity for recursion, such as the FLN.

In fact, there is no strong evidence that humans are even the only creatures that can think recursively. Even HCF comment that other animals could have it, just not in language (which borders circularity if one is saying that recursion is the basis of human language).⁷ The human brain is bigger and more complex than other brains and I have written in many places that it is the greater abilities and computational power of the human brain that most likely underwrite human language, without any need to appeal to the concept of a faculty of language, either narrow or broad.⁸

In short, the question is not whether humans can think recursively. The question is whether anyone can demonstrate that this ability is linked specifically to language, rather than to human cognitive accomplishments more generally. Chomsky has never shown this. And I have argued that the Pirahã data are important precisely because they render the capacity claim nearly impossible to demonstrate.

If I am correct about Pirahã, then no sentential grammars of human languages need to be constructed recursively. If one language can lack recursive structures, anyone or all can. No language must have it. People may all think recursively but lack recursion in their grammars. What I have shown is that for the very reason that the Pirahãs can think recursively, then if their language lacks recursion, recursion is not fundamental to human language but is rather a component of human cognition more generally. To claim otherwise, again, is to claim that all languages of the world can lack recursion but recursion is still alone the Narrow Faculty of Language. And that is empty prattle. If there is anything innate and specific to the human capacity for language, the Pirahã data shows that recursion is not part of it.

Related to these confusions is another regarding my work. It is often claimed that my arguments against the idea that there is a FLN, as HCF put it, are mistakenly construed by me to mean that there is no Universal Grammar. Now, I do not in fact believe that there is anything like a Universal Grammar.⁹ But I do not believe, nor have I intended to claim, that my recursion work shows there is no Universal Grammar (UG)—not unless Chomsky wishes to conflate FLN with UG. The arguments against UG are independent and quite strong. In fact, I made these arguments years ago in a series of papers in a debate in The Journal of Linguistics with David Lightfoot and Stephen Anderson, two proponents of UG, as seen especially in the book of theirs that I reviewed, The Language Organ (in which I argued that the very notion of a language organ of the type they were advocating was incoherent).

Once again, it is puzzling—though it partially explains their misunderstandings—that the authors of this review mention only a couple of general-audience books I have written but they fail to cite my many peer-reviewed publications on the topic. This is perhaps what leads them to misrepresent my work, since the latter publications are necessarily more technical.

In recent years, I have claimed not only that not all languages need recursion, but that non-recursive, linear grammars (where words are arranged without tree structures, like beads on a string), are not only possible, but were likely the earliest grammars of humans. My claim in How Language Began, is that Homo erectus had such grammars nearly two million years ago. If this is correct, then the capacity for recursion was not likely even relevant to the evolution of language. Moreover, others have argued for similar points, such as in recent work by Ray Jackendoff and Eva Wittenberg.¹⁰

Whatever one thinks of Wolfe’s The Kingdom of Speech, neither it nor my own work is in error for failing to distinguish between the presence of a feature of language versus capacity for a feature. The authors of the review discussed here have seriously erred in failing to recognize that showing that no such distinction can be maintained was precisely the point of my work.¹¹

Daniel Everett

David Lobina and Mark Brenchley reply:

In his reply, Daniel Everett seems to repeat the same misunderstandings that we had hoped to clarify in our review. Before addressing them, however, we must first note two minor but substantive errors.

The first is Everett’s claim that Chomsky’s “special preoccupation” with recursion only began in the famous 2002 Science paper co-authored with Hauser and Fitch.¹² In fact, in our review we supplied clear quotations, spanning some fifty-odd years, showing recursion-as-self-reference to have long been part of Chomsky’s core characterisation of the human capacity for language, quotes that are remarkably consistent in their terminology. Indeed, Everett’s own reply acknowledges that Chomsky has “long been concerned with recursion.” Perhaps Everett means a special preoccupation with Merge. But as we show below, this is no help.

The second such error is Everett’s claim that we failed to “emphasize that it [i.e., Merge] builds embedded structures naturally.” But we explicitly stated that Merge “operates by directly embedding all kinds of linguistic material into all kinds of other linguistic material,” with this operation such that “embedding and self-embedding amount to the same thing.” All of which seems a pretty straightforward way of saying that Merge naturally builds embedded structures.

These minor points noted, we will argue that Everett still misconstrues what recursion actually means for Chomsky as well as what it means to capture the specifics of the human capacity for syntax.

At its heart, our review made the following points:

Recursion in mathematical logic is defined in terms of self-reference, not self-embedding.
This has always been Chomsky’s sense, and he has been remarkably consistent in his usage throughout his career.
While Chomsky has indeed progressed to a new grammatical formalism, Merge, both this and his original formalism (i.e., Post’s production systems) are recursive in precisely this self-referential sense.
This is so despite the fact that they obviously differ in terms of the details of how they operate (e.g., Merge is a structure-building binary operator; production systems are string-rewriting systems).
The way in which linguists adapted the earlier production systems gave rise to two different senses of recursion: one defined in terms of self-reference, which is a global property of the system from which particular re-writing rules may be constructed; the other defined in terms of self-embedding, which is a local property of the particular re-writing rules that actually get constructed.
Most of the field has ended up focusing on the second sense, thereby missing Chomsky’s core sense.
Once Chomsky’s core sense is understood, it is clearly immaterial to argue against “Chomskyan” recursion on the basis of this or that language happening to lack self-embedding, for the very simple reason that self-embedding is not what Chomsky means by recursion.

If the original review was not quite clear enough on these points, it was that the inevitable constraints of space meant we could not offer as detailed an account as we would have liked. We will try to do better here.

In his more technical work, Everett generally defines recursion in two steps, first, by speaking of it as an operation that applies over its own output, and then further unpacking it in terms of self-embedding.¹³ Unfortunately, he never offers an explicit argument for this second step. That simply will not do, for the simple reason that claims to recursion do not automatically entail claims to embedding. Rather, a specific argument always has to be made to justify that further unpacking. For instance, while it is trivially true that recursive operations apply over their own output, this is as true of the operation of iteration, which for most people does not involve embedding at all.

Any textbook in mathematical logic, such as Rózsa Péter’s 1967 book Recursive Functions, a classic in the field, will offer a description of what recursion means for the logician (and for Chomsky):

[Recursions are] definitions which give values at certain initial places and prescribe how the remaining function values are to be determined from the function values at preceding places.¹⁴

Indeed, our review provided a definition of the factorial class in these very terms. Thus, specifying the value of the factorial of 4 requires specifying the factorial of 3 (a “preceding place”), then the factorial of 2 (another “preceding place”), until reaching the factorial of 1 (an “initial place”), whose value is established to be 1 ab initio. What the self-reference of a recursively-defined function does, then, is call upon another recursively-defined function; no more, no less. In the case at hand, the factorial of 4 calls upon the factorial of 3, and so on, but nowhere is there any “embedding” of functions inside other functions. The functions are technically independent, with each calculating its own particular value.

As stressed in our review, such recursive definitions are a central feature of Post’s production systems. And it is for this reason that pre-Merge Chomsky originally used them to capture our linguistic knowledge, a point to which Chomsky has been sensitive ab initio. Thus, in a 1963 paper with George Miller, we find the following:

[B]y a grammar we mean a set of rules that...recursively specify the sentences of a language. In general each of the rules we need will be of the form

Φ₁, … , Φ_n → Φ_n+1

where each of the Φ_i is a structure of some sort and where the → relation is to be interpreted as expressing the fact that if our process of recursive specification generates the structures Φ₁, …, Φ_n then it also generates the structure Φ_n+1 [emphasis original].¹⁵

This seems pretty clear to us. Not only is a Chomskyan grammar simply a recursively-defined system, it is recursive in exactly the mathematical sense we highlighted. There is no reason to take our particular word for it. One need only turn to the work of Geoffrey Pullum, certainly no Chomsky acolyte, for a clear appraisal of the underlying continuity. Thus, in a 2007 paper, Pullum states that a generative grammar, whether modelled using production systems or Merge, “is a recursive definition of a specific set, ipso facto computably enumerable.” And by computably enumerable, he is using the contemporary name for what we referred to as recursively enumerable in our review, the term we noted to be the core concept underlying Chomsky’s theory of grammar.¹⁶

In other words, though the production systems used by the early generative grammarians were eventually replaced with Merge, the key mathematical sense of recursion-as-self-reference persists. Thus, we find Chomsky, the linguist, describing Merge as a set-theoretic operation in which repeated applications over one element yield a potentially-infinite set of structures, drawing a rather apt analogy with the successor function. And thus we find George Boolos, the mathematical logician, putting together a conception of set formation in which sets are said to be “recursively generated,” by which he meant the “repeated application of the successor function,” much as in current descriptions of Merge.¹⁷

Crucially, therefore, while Everett is correct to note that Merge is a recursive operation that builds binary-branching structures, what his reply fails to appreciate is that the recursive bit and the binary-branching bit are distinct aspects of Merge. As a result, Everett is at best muddying the waters when he claims that all recursion means for Chomsky nowadays is Merge and that all Merge involves is binary branching. The overall picture is much more nuanced.

Indeed, Everett should know this. In the co-authored PLOS article which Everett directly refers to in his reply, Everett et al. discuss critical work by Nevins, Pesetsky, and Rodrigues.¹⁸ They note that, while Nevins et al. do draw a close connection between Merge and the property of recursion, they do so precisely on the basis of Merge’s capacity to reapply over its own output, not for any reasons to do with it being an embedding or binary-branching operation. But then it's not at all clear to us why Everett insists on believing he has critically undermined the Chomskyan case for recursion-as-self-reference by arguing against the universality of recursion-as-self-embedding, since his own published work acknowledges these to be distinct things.¹⁹

This leaves Everett’s second misconstrual; namely, the manner in which he locates the particular properties of Pirahã within an overarching theory of grammar, as expressed in his claim that “because a language lacks recursion, there can be no empirically meaningful claim that there is a language-specific capacity for recursion.”

Before addressing this claim, however, and at the risk of once more repeating ourselves, let us stress that how this particular issue pans out has no bearing on the role of recursion-as-self-reference in Chomsky’s theory of language. In our sense, recursion-as-self-reference just is what Chomsky means by recursion. And it is in this sense that recursion is taken to be a central feature of generative grammar as a framework for modelling the human capacity for language, not recursion as self-embedding. As such, claims to the contrary based on arguments for or against the universality of self-embedding are entirely incidental. And, indeed, it is especially unhelpful to continue to speak of recursion without properly distinguishing between these distinct senses. To do otherwise is merely a recipe for misunderstanding, language seen through darkened glass.

That said, our take on the distinction between our underlying capacity for language and the language-specific implementations of this capacity is, we think, pretty straightforward. Suppose the overarching purpose of linguistic theory is to characterize the nature of the human capacity for language. Then any attested syntactic feature must somehow be reflected in our underlying theory. Naturally, were the particular feature of interest pervasive, then we would reasonably expect such a feature to be rather central. And yet, were a particular feature only ever found to be present in a single language, then our theory must still be such as to license its existence, even if this is only as an artefact of some other (perhaps more abstract) feature. For how can a feature be linguistically implemented unless it is a component of our underlying capacity for language, however latent that feature may ultimately be?

So framed, we fail to see the force behind the claim that the supposed lack of Pirahã self-embedding prevents there being any “empirically meaningful” arguments for a special linguistic capacity for self-embedding. After all, extensive empirical investigation has shown self-embedding to be a rather pervasive feature of the world’s languages. So much so, in fact, that self-embedding is more intuitively construed as the default setting: a feature that is naturally expressed unless something else gets in the way.

More foundationally, however, such an argument neglects the obvious fact that a capacity is exactly that. It is what makes something possible, not inevitable. As such, all you really need to establish an “empirically meaningful” claim that a particular syntactic feature reflects a particular capacity for that feature is an attested instance of that feature. This would not of itself be an especially compelling claim, of course, and there would be much analytic and argumentative work yet to do. But it would still be an empirically meaningful claim.

And, indeed, Everett’s own technical work shows self-embedding to be a particularly instructive case in point here. Take the aforementioned PLOS study. This involved a corpus-based analysis of various Pirahã texts, concluding that there was no unambiguous evidence of syntactic embedding (a highly tentative result, the authors themselves stress).

According to Everett’s reply, however, another result of this study is that the grammar of Pirahã is “perhaps best described via [sic] a linear, non-recursive grammar,” where by “linear” Everett seemingly means “non-binary” branching and by “non-recursive” he presumably means “non-self-embedding.” But this is not some sort of knock-out blow.

To see this, we must first note that, in the study of formal languages, the expressive power of a language is usually determined by employing the vocabulary of rewriting rules (that is, production systems). The nature of these rules is such that they allow a hierarchy of different grammars and languages to be devised, by further specifying the form of the grammatical rules of each class—the so-called Chomsky Hierarchy.²⁰

What the PLOS study actually shows is that, from the perspective of formal language theory, the expressive power of Pirahã would appear to place it in what is called the “regular” class of formal languages. Under the Chomsky Hierarchy such a language would be generated by what is called a “regular grammar.” Now, Everett is right to say that, unlike the “context-free” grammars of the Hierarchy, regular grammars make no use of the sorts of re-writing rules that the early generativists drew on in order to capture the phenomenon of syntactic self-embedding. Rules, that is, of the following sort:

S → NP VP
NP → NP S

But by “linear,” formal language theorists do not mean “non-binary.” Instead, the “rules are called linear because only one non-terminal can appear on the right side” of a production rule. The word “linear” simply refers to the form of the production rules exhibited by this class.²¹ That is, unlike the rules exhibited in (1) and (2), which in principle allow for any number of non-terminal (i.e., rewritable) symbols, normally represented using capital letters, the formal grammar supposedly necessary for capturing the syntax of Pirahã does not. Hence, the possibility of (3) relative to such a grammar but the impossibility of (4):

S → John VP
*S → NP VP

Secondly, despite not countenancing the same range of phenomena as a context-free grammar, regular grammars still embody exactly the kind of self-referential property that Chomsky has always meant when he speaks of the recursive nature of our linguistic knowledge. This is true of the concrete rules licensed by such a grammar, whereby we can still have the same symbol on both sides of a rule, as in (5):

S → … S …

It is also true in the much more fundamental sense that every class of languages within the Chomsky Hierarchy exhibits recursion-as-self-reference, since each such class is merely a specific type of production system. They thus exhibit the core property of such a system; namely, that they proceed on the basis of definition by induction. In other words, and to allude to the Chomsky and Miller paper again, the rules are all of the form Φ₁, …, Φ_n → Φ_n+1. As such, even if Merge turns out not to be the right characterisation, Everett’s own analysis shows recursion-as-self-reference to be requisite for capturing the human capacity for language, self-embedding be damned!

Finally, we have the fact of the Chomsky Hierarchy being such that the class of languages generated by “regular” grammars are situated below the “context-free” class—that is, those generated by the sorts of rules exemplified in (1) and (2) above, to which the early generativists had recourse when capturing syntactic self-embedding. This is clear. It also undermines the logic of Everett’s argument. This is true in two ways.

First, the Hierarchy is such that the higher classes of language generally contain the lower classes, with each higher class able to capture the phenomena captured by those classes situated lower in the Hierarchy. As such, the kind of formal grammar required to model the properties of a self-embedding language such as English can easily be used to model a non-self-embedding language such as Pirahã. Contra Everett, therefore, it is entirely consistent with the empirical evidence that human beings have a specific capacity for syntactic self-embedding. It is simply that there is no obligation to draw on such a capacity, making it essentially unsurprising that we should eventually come across a language such as Pirahã.

Second, and even more critically, the containment relationships within the Hierarchy are also such that the lower classes cannot capture the range of phenomena captured by the higher classes. It is literally beyond their representational capacity. And so, while the grammatical formalism required for modelling a self-embedding language such as English can straightforwardly model a non-self-embedding language such as Pirahã, the converse does not hold. So, again, the empirical evidence actually favors the more Chomskyan take, with a self-embedding language such as English demonstrating an empirical lower bound on the grammatical theory required to properly model our specific capacity for language. And this empirical fact would still be the case even if English were the only language ever found to exhibit self-embedding

Recursion-as-self-embedding, then, turns out to be an especially apt case in point against Everett’s argument that it is “meaningless to claim that there can be a capacity for language that need not be used in any actual language.” On the contrary, some empirical properties are such that their numerical distribution across the world’s languages is basically irrelevant. All you need is one language which empirically demonstrates our grammatical knowhow to be pitched at a “higher” representational capacity than that required for every other language. Or so it seems to us.

Of course, the rather basic truism that you can only represent what you can represent is hardly a new point to make. Jerry Fodor has been banging on about it since at least 1975.²² Old though it may be, however, it remains a core insight that all theories of language must grapple with, even if the force of this insight is not always recognized to the extent it should be.

Nevertheless, by way of emphasizing the importance of this point, it is worth noting a striking property of the various grammatical theories that have been shown to be minimally necessary for capturing the criterial features of our remarkable capacity for language (e.g., Minimalism, Combinatory Categorial Grammar, and Tree-adjoining Grammar).

Specifically, while none of these theories is technically a rewriting system, they can still be productively modelled as such in order to get a better sense of their expressive power. So modelled, and despite these being distinct frameworks which disagree in their means for capturing our grammatical capacities, we find them converging to what is termed the “mildly context-sensitive” class of languages.²³ Crucially, this is a class that locates our capacity for language higher up the Chomsky Hierarchy than Everett’s “linear non-recursive” formalism. It thus imposes a lower bound upon a viable grammatical framework, one that Everett’s “beads on a string” grammar intrinsically fails to meet.²⁴ Simply put, it is just not possible for such a grammar to adequately model the sort of grammatical knowledge exhibited across the world’s languages.

But if that is so, then Everett’s work on Pirahã self-embedding effectively reduces to one of three claims:

The Pirahã capacity for language is such that they are literally incapable of speaking English;
Human beings, a particular evolutionary species, were suddenly somehow able to transform their representational capacities while somehow remaining the same species;
The human capacity for language is an artefact of some wider property of human cognition that has been differentially drawn on over the course of linguistic history.

Clearly, only (3) is even remotely palatable, and it is unsurprisingly the space that Everett has sought to open up, both in his reply and elsewhere.²⁵ As far as we can see, however, this space is only ever posited rhetorically, with Everett never providing the sort of extensive, technical demonstration that he surely must for his claim to go through (or, at least, not in the works we are aware of). Namely, that the (self-)embedding which we find in basically every human language is precisely the sort of (self-)embedding that we supposedly find in non-syntactic domains.

Instead, and in notable contrast to the level of detail that we find in Everett’s linguistic analyses, when it comes to this critical juncture, what we find is an unpersuasive reliance on vague analogies and generalities. And that simply is not enough, especially given the fact that when others have taken the trouble to make an explicit comparison, we generally find syntactic and non-syntactic self-embedding to be quite different phenomena.²⁶ They really just do not seem to be the same thing when looked at closely.

Absent such a demonstrable similarity, therefore, your average Chomskyan, for want of a better term, currently finds himself or herself with the following argument. Every human language except (maybe) Pirahã and (maybe) a few others displays a particular kind of self-embedding. Accordingly, this kind of self-embedding must be part of the human capacity for language. While, in principle, this sort of self-embedding could be calqued from a kind of self-embedding found in some other, non-syntactic domain, no analysis of such domains has ever come close to showing this to be true. And until it does, it is the Chomskyans who seem to us to be in far the better shape.

At the outset of his reply, Everett notes that, while it is all too easy to lay claims of misunderstanding, such claims can nevertheless be perfectly valid. We agree with both points, and so stand by our original review. The “anti-recursion” work weaponized by Wolfe in his ill-thought book remains marked by fundamental miscontruals regarding both what Chomsky has been arguing and why he has been arguing it.²⁷

As such, we can only conclude on a rather blunt note, in the hope of exiting the perennial loop in which the present debate is seemingly stuck. In no sense is Pirahã self-embedding the counterexample that Wolfe and Everett take it to be. It is not a counter-example to the Chomskyan sense of recursion, because recursion-as-self-embedding is not recursion-as-self-reference. And it is not a counter-example to Merge, or even to any wider claims regarding a sui generis capacity for language, because it is fully consistent with such an operation and such a capacity.

Letters to the Editors