In 1866, the Linguistic Society of Paris issued a stern injunction: “The Society does not accept any communication concerning either the origin of the language or the creation of a universal language.”1 On peut facilement imaginer pourquoi. The late eighteenth and early nineteenth centuries, as Giorgio Graffi observed, marked the blossoming of modern comparative linguistics.2 William Jones, a British judge in India, and Jacob Grimm, the author of a collection of morbid German fairy tales, were among the pioneering linguists studying Indo-European languages. They aimed collectively to discover historical connections among languages and to reconstruct their origins in an Indo-European Ursprache. But their work focused on the external features of individual languages, rather than on the origin of language as a cognitive faculty; and it was conducted, as Sylvain Auroux has emphasized, against a backdrop of evolutionary and phylogenetic thought.3 Linguists told themselves many stories about the evolution of language, and so did evolutionary biologists; but stories, as Richard Lewontin rightly notes, are not hypotheses, a term that should be “reserved for assertions that can be tested.”4
The human language faculty is a species-specific property, with no known group differences and little variation. There are no significant analogues or homologues to the human language faculty in other species.5 The notion of a species-specific biological trait is itself unremarkable. Species-specific traits are essential to the very definition of a species, at least for multicellular animals requiring reproductive isolation, and species specificity is both widespread and expected according to conventional evolutionary theory. Still, an expectation, it is important to stress, is not yet an explanation.
Why only us? Why indeed.
That the language faculty comprises a distinctive human phenotype was not recognized in nineteenth-century historical and philological research, nor in twentieth-century structural linguistics. This changed with the development of modern theories of generative grammar. In the early 1950s, Noam Chomsky, Morris Halle, and Eric Lenneberg began reading both Niko Tinbergen’s The Study of Instinct and Konrad Lorenz’s articles from the previous decade.6 Tinbergen and Lorenz thought in terms of complex innate behavior, which, they assumed, was governed by a species-specific genetic apparatus, with evolution playing a large role in explaining why species differ in their behavior. Sticklebacks do not, after all, react like baby grey geese to maternal geese models. Each species comes equipped with a sophisticated, distinctive, developmental and behavioral repertoire, which is released by environmental triggers.
It was in 1967 that Lenneberg published his important treatise, Biological Foundations of Language. He focused a biological lens on nearly the whole of contemporary linguistics.7 His chapter on the evolution of language is a model of nuanced evolutionary thinking.8 “[T]he development of language in children,” Lenneberg argued, “can best be understood in the context of developmental biology.”9 He was careful to observe that while “[t]he endowment [for language] has a genetic foundation,” it does not follow that “there are ‘genes for language’ or that the environment is of no importance.”10 Lenneberg refused to indulge in storytelling. His conclusion regarding species specificity anticipates our own:
Contemporary species are discontinuous groups (except for those in the process of branching) with discontinuous communication behavior. Therefore, historical continuity need not lead to continuity between contemporary communication systems, many of which (including man’s) constitute unique developments.11
Lenneberg anticipated the difficulties inherent in any evolutionary explanation of human language:
Another recent practice is to give speculative accounts of just how, why, and when human language developed … Most speculations on the nature of the most primitive sounds, on the first discovery of their usefulness, on the reasons for the hypertrophy of the brain, or the consequences of a narrow pelvis are in vain. We can no longer reconstruct what the selection pressures were or in what order they came, because we know too little that is securely established by hard evidence about the ecological and social conditions of fossil man. Moreover, we do not even know what the targets of actual selection were. This is particularly troublesome because every genetic alteration brings about several changes at once, some of which must be quite incidental to the selective process.12
Lenneberg’s scruples were catholic. He observed that current generative grammars were not yet sophisticated enough to serve as the foundation for any evolutionary explanation of human language: “Linguists, particularly those developing generative grammar, aim at a formal description of the machine’s behavior; they search mathematics for a calculus to describe it adequately … A totally adequate calculus has not yet been discovered.”13
For these reasons, no sensible evolutionary analysis of the origin of language could be carried out until the 1990s. By then, the minimalist program had provided evidence that far-reaching and often surprising universal linguistic principles could be derived from very simple assumptions. Every human language is a finite computational system generating an infinite array of hierarchically structured expressions. This is the basic property (BP) of language. Every structured expression has a definite semantic interpretation and can be expressed by some sensory modality—speech when possible, gesture when not. The BP is best explained, we argued, as the expression of an underlying computational system, an example of those innate repertoires to which Tinbergen, Lorenz, and Lenneberg called attention. The advent of sophisticated machine learning techniques has only served to justify their point of view. And ours. Lacking such repertoires, machine learning requires an enormous number of training examples.14 Ian Goodfellow, Yoshua Bengio, and Aaron Courville thus remark that
As of 2016, a rough rule of thumb is that a supervised deep learning algorithm will generally achieve acceptable performance with around 5,000 labeled examples per category and will match or exceed human performance when trained with a dataset containing at least 10 million labeled examples.15
Whatever else children may be doing when acquiring their native language, they are not consulting ten million labeled examples.
Discoveries in genomics and cognitive biology have served to refine and buttress our conclusions. Much of the speculation about the timing and biology of language evolution has focused on the FOXP2 gene and its related genomic network. In 2002, Wolfgang Enard et al. found that the human variant of FOXP2 was under positive selection, and so provided strong evidence for its evolution by natural selection.16 In Why Only Us, we argued against the thesis that FOXP2 is the gene for language. FOXP2 functions as part of the system for externalizing language to the sensory-motor interface, and many aspects of externalization are not specific to human beings. Citing comparative avian work by Andreas Pfenning et al., we demonstrated that many of the systems for vocal learning and production must have been in place before the emergence of language.17 This follows the typical evolutionary pattern. By the same token, Elizabeth Atkinson et al. carefully reexamined FOXP2 together with the intronic regions that might have been involved in a selective sweep.18 They found that human-specific DNA and amino acid variations matched those of Neanderthals or Denisovans but not other non-human primates. They found no evidence for a recent selective sweep, as suggested by Enard et al.19 Nor did they find evidence of an ancient selective sweep at any proposed regulatory control regions associated with FOXP2. The regions themselves, they noted, did not “appear to be related to language.”20 Atkinson et al. accept “the extensive functional evidence supporting FOXP2’s important role in the neurological processes related to language production.”21
Language production is a matter of externalization.
How far back does language go? There is no evidence of significant symbolic activity before the appearance of anatomically modern humans 200 thousand years ago (kya).22 The South African Blombos cave site contains abstract patterns using ochre crayon on silcrete. These have been dated to approximately 80 kya.23 There is no doubt that these patterns, which represent the earliest known drawings, were executed by anatomically modern humans. In 2018, Dirk Hoffman et al. claimed to have found cave art in Spain dating to roughly 65 kya and thus predating the earliest known arrival of modern humans in Europe.24 Dates have been corrected to approximately 47 kya, the time at which human beings appeared in Europe. According to Ludovic Slimak et al., this is a date “much more consistent with the archaeological background in hand.”25
Recent genomic work has refined our claims about symbolic activity. The emergence of language occurred earlier than we thought, and certainly earlier than we suggested. The relevant research is drawn from the detailed genomic sequencing of human subpopulations, and establishes that between 200 kya and 125 kya, the San people in Southern Africa became genomically separated from other human populations.26 The San are alive today; their ancestors presumably shared the human language faculty. The BP must have emerged sometime between 300 and 200 kya. The final events leading to the BP must have been simple. This is a conclusion in agreement with the minimalist program. Riny Huybregts has shown that San languages have a unique feature, a rich repertoire of clicks.27 He concludes that the language faculty emerged with Homo sapiens, or shortly thereafter, but externalization in one form or another must have been a later development, and quite possibly involved little or no evolutionary change.
Not everyone has welcomed these conclusions. Yet in his review of Why Only Us, Cedric Boeckx, one of our critics, sometimes finds himself on our side when he very much wishes to be elsewhere. “According to [Berwick and Chomsky],” Boeckx writes, “human language can be reduced to a single unique trait.”28 He is referring to the BP, what Marc Hauser, Chomsky, and W. Tecumseh Fitch termed recursion.29 Boeckx is mistaken. We described the BP as a basic property, one among others. Quite in addition, Boeckx scruples at our “failure to engage” with the research community. We are disconnected, he writes, from “fields whose primary focus is … brain rhythms or protein–protein interactions,” and from “the many serious attempts to bridge the gap between mind and brain, and, in particular, [from] ongoing work with animal models.” Birdsong has also escaped our notice, along with “ongoing work in neurogenetics,” and a “vast amount of data … gathered … in genomic[s].”30 Boeckx is especially concerned lest we miss the importance of the interactome—the complete set of molecular interactions within a given cell. His concerns are misplaced. Why Only Us discussed the idea of an interactome extensively.31 At the first Evolution of Language conference in Edinburgh in 1996, a meeting that Boeckx hails as the nexus of the research community, Robert Berwick presented a detailed, multidisciplinary, neurobiological hypothesis for the appearances of both the BP and Merge; it was a hypothesis linked to details about avian neurobiology and FOXP2.
Recent work continues to point to the role played by FOXP2 in the sequential ordering of motor gestures, but without identifying its adaptive role in the origin of the BP. Sequential ordering competence for auditory processing appears widespread throughout vertebrates, accounting for strong evolutionary conservation between regions of the frontal cortex in macaques and humans. This is the suggestion made by Christopher Petkov’s research group, among several others.32 From this standpoint, the involvement of FOXP2 in axon guidance is unsurprising.
For all that, the chasm between phenotype, algorithm, and neural implementation remains just that—a chasm. We do not yet understand the space of algorithms that might inform, or guide, the BP. This is not to render language a mystery. Physicists have not yet completely derived the properties of quark confinement from the Standard Model either. “A description couched in neural terms is needed,” Boeckx observes, and this description “must then be related to genes.”33 This is unassailable. Such a description is needed. While it true that there is no direct link between the genome and Merge, it is also true that we have no direct link between the genome and any complex phenotype—say, genes and walking. This remains one of the great scientific challenges, one more thing that we cannot yet puzzle out.34
Birds sing and humans speak; it is irresistibly tempting to see a connection. “This is an area,” Boeckx remarks, “where work linking genetics and neuroscience has been progressing rapidly.”35 It is an area, he goes on to argue, to which we have paid little attention. On the contrary. We have been nothing if not attentive:
Thanks in part to comparative and neurophysiological and genomic studies of songbirds, the biological basis for vocal learning is well on the way to being understood as an evolutionarily convergent system: identically but independently evolved in birds and us. It may well be that vocal learning—the ability to learn distinctive, ordered sounds—can be bootstrapped from perhaps 100–200 genes (Pfenning et al. 2014).36
Why Only Us cites a long list of recent research results, including well-regarded surveys and analyses written by Berwick and various birdsong experts, who concluded that birdsong is similar to human speech. Birdsong and speech follow linear order rather than hierarchical structure and, for this reason, they are remote from the BP.37 There is a common, conserved genetic toolkit for building vocal learners, one aligned with neurological wiring. To have understood this is surely progress. With the externalization apparatus for language in place, the rapid emergence of language itself is far easier to explain. Once this part of the story is complete, we will understand in some detail how the printer for human language works and how it evolved.
On this, Boeckx agrees with us.
In their survey article, Hayley Mountford and Dianne Newbury conclude that
The theory that the presence of [the] ‘humanised’ FOXP2 gene in Neanderthals drove language ability is naive and overly simplistic. FOXP2 clearly plays an important role in speech evolution and production … We may be able to build a far clearer picture of how language evolved once we increase our understanding of the neuromolecular pathways involved [in] language development in modern humans.38
Rob DeSalle and Ian Tattersall concur:
Both the neural capacity for language, and the anatomical apparatus needed to express it, result from some profound changes in major developmental pathways in the immediate ancestor of Homo sapiens that are unlikely to be simply related to any of the gene changes yet fingered.39
In his own contribution, Boeckx, along with Pedro Tiago Martins and Maties Marí, observes that SRGAP2C is involved in axon guidance and is found only in Homo sapiens, Neanderthals, and Denisovans.40
On this, we agree with Boeckx.
Gene duplication of this sort, according to Boeckx and his colleagues, “may have contributed to the establishment of a critical aspect of the vocal learning circuit.”41 The authors believe this an argument for the gradual evolution of human language, but their conclusions, if true, confirm our judgment that the antecedents for language were in place 300 million years ago.
It remains true that we do not have a soup-to-nuts, or gene-to-neural-circuit-to-phenotype account for any trait of interest, let alone the BP.42 Even in the case of FOXP2, we do not know the mapping from genomic to phenotypic expression, except in the most general terms. Some scientists have looked at non-coding mutations in order to account for the differences between human and nonhuman primates. This makes sense. Known protein differences between humans and nonhuman primates are small. A few years ago, J. Lomax Boyd and his colleagues engineered a mouse version of some of the enhancers that alter the development of the human neocortex.43 Lucía Franchini and Katherine Pollard noted that while Boyd et al. succeeded in their basic goal of demonstrating that humanized mice did, indeed, display increased neocortical size when compared to chimpanzeed mice, “[w]hat has not yet been done is to show the molecular pathways or developmental processes through which the genetic differences are expressed.”44 Thus, “linking human-specific genetic changes to unique cognitive traits” remains “a long and twisted road.”45
We would be the first to welcome progress on this front. Absent a more complete, concrete understanding of the space of genomic, developmental, and neurological possibilities, it is difficult to go beyond the phrase that we have so often adopted: the BP emerged by means of a slight rewiring of the brain.46 This phrase, although promissory in part, is not entirely so. A slight rewiring can sometimes result in a large transition. This is a point that Boeckx dismisses, although there is considerable evidence for major transitions in evolution, a point long stressed by evolutionary biologists such as John Thompson, John Maynard Smith, and Eörs Szathmáry.
The ability to process sequential information is shared across many vertebrate species—perhaps all. A slight alteration in the wiring of a simple sequential processor is sufficient to endow it with a push-down stack. This makes for a significant improvement in its computational power. It is a point of some significance: a push-down stack is needed to process hierarchical structures. In our example, which is entirely notional, we assume that sequential processing is realized via a shift register, where information flows in from the left and is stored in the individual registers that hold data. These are depicted as the four square boxes labeled FF0 through FF3.
Figure 1.
As data are input on the left, data that were in FF0 are transferred to the immediately succeeding register, FF1, and so forth for further registers.47 Suppose that we now rewire the same system by adding a line running along the top, along with a sequence of four NAND (not–and) gates. These are drawn as a unit of three bell-shaped components. The NAND gates serve to move the data to the left as well as to the right, popping the data off as well as pushing them down. We add wiring above this new diagram that symmetrically matches the wiring below the shift register diagram.
Figure 2.
The resulting circuit operates as a push-down stack. The moral of this example is evident: it does not take much to rewire a sequential processor. Detailed knowledge of axon guidance might prove useful in the future, but for the moment, we do not know whether or how shift registers are implemented in the brain, nor do we know much about the phenotypic changes that produce such changes in neuronal structure.
There is no evidence that great apes, however sophisticated, have any of the crucial distinguishing features of language and ample evidence that they do not.48 Claims made in favor of their semantic powers, we might observe, are wrong. Recent research reveals that the semantic properties of even the simplest words are radically different from anything in animal symbolic systems.49 As for pragmatics, there are of course numerous similarities between us and other primates, for example, regarding turn taking and communication. Dog owners are quite familiar with their pet’s ability to attract their attention by some repetitive behavior; one does not have to turn to recent research for such examples. These examples have no bearing on the crucial distinguishing properties of human language.
One more word before we go. The basic computational operation of Merge, Boeckx argues, “is more complex than it sounds,” because it can be analyzed into as-yet-unknown building blocks—cellular structures of some kind, for example.50 He does not spell out what these more primitive elements might be or why they are of evolutionary importance. Evolution itself fares no better. “[Charles] Darwin’s explanatory logic,” Boeckx asserts, “conflicts with assertions that some species are unique.”51 It does nothing of the sort. The thesis that the BP is unique to humans, along with other core properties of language, is entirely consistent with Darwin’s theory of evolution, just as the waggle dance is unique to the genus of Apis (honeybees) and not to other insects. The conventional definition of animal species since Theodosius Dobzhansky and Hermann Muller requires the uniqueness of species-differential traits that work to ensure reproductive isolation.52 Boeckx has succumbed to the view that evolutionary change must necessarily be gradual: this is the infinitesimal continuity hypothesis favored by Darwin and Sir Ronald Fisher. For Fisher, this led to a micromutational view. But this is only an empirical hypothesis.53 There is no necessary condition that evolutionary change be continuous and infinitesimal, except in the sense that viability is preserved at each step. Evolution need not always proceed at a snail’s pace. Allen Orr and other evolutionary biologists have shown that the first step in the adaptive walk of a single gene changing over time might actually have the largest phenotypic effect of all.54 Jordi Bascompte’s review of Thompson’s Relentless Evolution, which we did not quote, bears quoting now. Bascompte opens by noting,
The book’s contents will strike many readers as novel. That is because in the past few years we have largely changed our views about the tempo and nature of adaptive evolution. For example, whereas a couple of decades ago almost everyone would claim that ecological and evolutionary timescales were uncoupled, we now know that evolution can proceed very rapidly.55
He then backs this viewpoint with a story: Thompson’s effort to describe Peter and Rosemary Grant’s pioneering work on Darwin’s Galapagos finches.
Thompson notes that when he contacted the Grants for permission to use a modified version of a graph they had published a few years earlier (which showed the evolution of beak size in one of Darwin’s finches over 30 years), they told him about an updated version that showed another shift in the direction of change. Thompson felt he was racing to produce a timely account of our current understanding of adaptive evolution. Is there still anyone who thinks that evolution always proceeds slowly?56
Why only us? Why only us? Why only us? One question has three parts. They are precisely the appropriate ones to ask regarding the evolutionary origin of language.
We were not, of course, the first to ask them. We echo in modern terms the Cartesian philosophers Antoine Arnauld and Claude Lancelot, seventeenth-century authors of the Port-Royal Grammar, for whom language with its infinite combinatorial capacity wrought from a finite inventory of sounds was uniquely human and the very foundation of thought. It is subtle enough to express all that we can conceive, down to the innermost and “diverse movements of our souls.”
It remains for us to consider what is, in fact, one of the great spiritual advantages of human beings compared to other animals, and which is one of the most significant proofs of reason: that is, the method by which we are able to express our thoughts, the marvelous invention by which using twenty five or thirty sounds we can create the infinite variety of words, which having nothing themselves in common with what is passing in our minds nonetheless permit us to express all our secrets, and which allow us to understand what is not present to consciousness, in effect, everything that we can conceive and the most diverse movements of our soul.57