To the editors:
In her essay, Iris Berent rightly points out that language should not be identified with speech. We now know that sign languages have all of the essential characteristics of true language, without reference to sound. Berent also suggests that Noam Chomsky’s viewpoint is now a minority one. Yet Chomsky does recognize that language can be signed as well as spoken, and there are aspects of his treatment of language that may still command respect.
In his more recent writing, Chomsky argues that what he rather confusingly calls I-language is not really what we understand as language at all.1 Instead, it is a mode of thought, made up of abstract symbols. These are combinable by means of a recursive operation known as merge, which creates mental structures with any degree of hierarchical complexity. Most if not all of the intricacies of the different languages as actually spoken or signed come from the process of externalization. This is the mapping of internal mental structures onto external devices, allowing us to share our thoughts with others. “It is a familiar fact,” Chomsky writes, “that the complexity and variety of language appears to be localized overwhelmingly—and perhaps completely—in externalization.”2
Chomsky suggests that the internal structures themselves are “little understood.”3 Nonetheless, he regards them as uniquely human, and the basis of what he terms universal grammar. An alternative possibility is that I-language may not be the mysterious, uniquely human structure envisaged by Chomsky, but may derive simply from imagination, the ability to generate scenarios involving the entities of our thoughts—people, places, things, times, ideas.4 Our mental structures do seem to include a property of merge, with endless possible combinations as manifest in episodic memories, plans, storytelling, and invention. In the words of the Israeli linguist Daniel Dor, language might then be described as “the instruction of imagination.”5
The question then is how these internal structures are externalized so they may be shared with others. This requires an intentional system providing sufficient specificity of signals, along with a means of shaping and combining them to map onto the internal structures. In evolutionary terms, the most obvious medium of externalization is the body. We evolved from tree-dwelling primates, with precise and intentional control of the limbs, and especially of the hands. This enables precise grasping, as well as specialized activities such as plucking, grooming, and bringing items of food to the mouth. This flexibility of action can be adapted to provide visible representations of what is in our minds.
Among great apes, bodily movements and gestures are more intentional and language-like than are their vocal calls, which tend to be more fixed, and often instinctive rather than learned. In the early stages of language evolution, gesture could therefore have provided a largely ready-made system for the externalization of mental content. Although attempts to teach apes to talk have been notoriously unsuccessful, they have been at least moderately successful in acquiring intentional communication through a form of sign language or by pointing to symbols on a keyboard.6 These capabilities probably became more flexible and complex following the split from ape-like species to the hominins, and especially with the emergence of obligate bipedalism in the genus Homo, dating from around 3 million years ago. Bipedalism freed the hands from locomotion, allowing them to adapt to new activities, such as throwing, tool making, and communicating. The latter may have begun as a form of pantomime during the early Pleistocene. Indeed, modern sign language still has many pantomimic elements.
Nonhuman primates also exert intentional control over facial movements, including actions associated with eating—biting, chewing, and swallowing—and the visible expression of emotion. It seems likely that facial movements were incorporated into gestural communication. They are also an important component of modern sign languages. Language itself may have evolved from gestures rather than from vocal calls. Gestures also alleviate much of what has been termed the grounding problem, since pantomime provides a fairly direct physical matching between the signal and what it represents.
So why speech? One scenario involves facial gestures gradually assuming prominence, freeing the hands for other activities. Facial gestures are also more efficient, requiring much less effort than movements of the arms, and focus attention on a smaller region. In the case of vocalized communication, many facial movements are contained within the mouth, and are therefore in large part not visible to the recipient. The solution seems to have been to bring vocalization itself under better intentional control, so that the shapes of internal gestures are represented in phonological patterns of sound, an idea captured in the motor theory of speech perception. The retreat of vocal gestures into the mouth is an early example of miniaturization, although speech is normally still accompanied by manual gestures and other facial movements.
Speech itself, then, must have involved new adaptations, including cortical control of vocalization. Many of the mouth gestures themselves may go further back in evolution. Great apes, for example, can communicate to some degree using voiceless gestures like lip smacks, tongue smacks, and teeth chattering. But although the addition of voicing required a biological adjustment, the choice of the actual vocal gestures themselves is largely the product of cultural learning. This is why there are some 7,000 different spoken languages in the world, which serve as much to exclude outsiders as to provide communication within groups and cultures.
Whether we sign or talk, the mapping of signals onto internal structures must be very rapid, usually to the point of automaticity, in order to produce fluent speech or signs. The thought of any particular concept, such as an apple, must immediately produce the word—and vice versa if we are to understand others. That is why talk is cheap. The flow of internal thought itself may be intentional and often laboured, but the mapping must be close to instantaneous—although there are of course some exceptional occasions when we grope for words. The rapid mapping between internal concepts and words, whether spoken or signed, may be the reason for the mistaken view that thought is simply internal talk.
The largely abstract quality of words raises the grounding problem, because speech does not have the iconic or pictorial properties of visible gesture. The problem may be partly solved by incorporating manual gestures in the early acquisition of speech. Infants learn to point before they learn to speak; “pointing,” it has been said, “is the royal road to language.”7 But the young brain is capable of making rapid associations and quickly attaches arbitrary words to external objects. This appears to be true even of some nonhuman animals, with reports of domestic dogs able to respond meaningfully to large numbers of spoken words.8
Although gesture may play a role in the early acquisition of speech, it is speech that becomes embedded as the primary communication system for the vast majority of people. Speech shapes early phonology, making it difficult to learn another spoken language without an accent. It also shapes linguistic structure, not because of the inborn nature of language itself, but because of the process of externalization that governs the way thought is mapped onto words.
Speech is indeed the default channel in hearing individuals, but it is far from fixed—the 7,000 spoken languages of the world differ vastly in phonology, morphology, and grammar, and their commonalities may have to do with human understanding of the world we live in. This common understanding includes such aspects as the serial nature of events, notions of causality and agency, theory of mind, and a general understanding of how the world works. Although evolution has shaped the brain to expedite speech as the preferred mode for communicating our thoughts, sign language functions equally well from a purely linguistic perspective.
In my view, Chomsky was right in suggesting that the generativity of language derives from the generativity of thought. It is the manner in which generative thought is mapped on to output systems that accounts for the vast differences between individual languages, whether spoken, signed, or for that matter, written. We have been shaped through culture and evolution to give precedence to vocal output, but little is lost by using overt bodily movements, as in modern sign languages, and indeed a gestural system may have preceded a vocal one in evolution. Chomsky may have been too parsimonious in his conception of I-language as the vehicle of thought and in supposing that it is unique to humans, but that is another story.9
Michael Corballis
Iris Berent replies:
In their letters, Carol Padden and Michael Corballis raise interesting questions regarding the equipotentiality of the language faculty with respect to its channels—speech and sign—and their roles in language evolution. In this brief commentary, I will not engage with their arguments but only offer some clarifications concerning the target piece.
To set the record straight: my reference to Chomsky’s position as a minority view is strictly a descriptive observation. Much to my chagrin, Chomsky’s account of language is not broadly subscribed to in cognitive science. But scientific merit is not a popularity contest—or certainly not a short-term one. Over the long run, it has been Chomsky’s research program that has founded cognitive science and shaped its research agenda ever since. In my mind, there is no question that, just as Corballis suggests, Chomsky commands the utmost respect.
While Chomsky’s own discussion of I-language has indeed focused mostly on syntax, nothing in his framework precludes the possibility that some aspects of phonology could feature into I-language, including Universal Grammar. The algebraic amodal view of phonology I endorse is entirely in line with Chomsky’s generative tradition.
Whether this hypothesis is correct is, of course, an open empirical question. Considerations of language evolution can inform this discussion. But it is difficult to evaluate evolutionary proposals, such as the one advocated by Corballis, unless linguists can accurately characterize the present state of the language faculty. If we don’t know what language is, how can we tell how it has become?
Addressing this question requires that we combine insights from linguistic analysis, experimental investigations, and computational work. And as Padden suggests, our approach ought to be broad and inclusive. The various individuals who have responded to my piece—Padden, Corballis, and Mark Aronoff et al.—are known experts in these disciplines, and I am grateful for their comments. I hope this discussion revitalizes research efforts into the design of the language faculty in all its forms.