The Galilean Challenge

Chomsky, Noam

How can a system such as human language arise in the mind/brain, or for that matter in the organic world, in which one seems not to find anything like the basic properties of human language?
—The Minimalist Program¹

In the early days of the modern scientific revolution, the great founders of modern science expressed their awe and wonder at the facts of language. “But surpassing all stupendous inventions,” Galileo remarked,

what sublimity of mind was his who dreamed of finding means to communicate his deepest thoughts to any other person, though distant by mighty intervals of place and time! Of talking with those who are in India; of speaking to those who are not yet born and will not be born for a thousand or ten thousand years; and with what facility, by the different arrangements of twenty characters upon a page!²

In their influential Grammaire générale, Antoine Arnauld and Claude Lancelot expressed the same sense of awe and wonder:

It remains for us to consider what is, in fact, one of the great spiritual advantages of human beings compared to other animals, and which is one of the most significant proofs of reason: that is, the method by which we are able to express our thoughts, the marvelous invention by which using twenty five or thirty sounds we can create the infinite variety of words, which having nothing themselves in common with what is passing in our minds nonetheless permit us to express all our secrets, and which allow us to understand what is not present to consciousness, in effect, everything that we can conceive and the most diverse movements of our soul.³

René Descartes was one of the most important influences on the Grammaire générale. In his Second Meditation, he argued that a new creative principle was required to explain the human capacity for the unbounded and appropriate use of language:

This is because the will simply consists in our ability to do or not do something (that is, to affirm or deny, to pursue or avoid); or rather, it consists simply in the fact that when the intellect puts something forward for affirmation or denial or for pursuit or avoidance, our inclinations are such that we do not feel we are determined by any external force.⁴

In the Discourse on Method, Part V, Descartes argued that the creative use of language marked the distinction between human beings and other animals, and between human beings and machines. A machine may be impelled to act in a certain way, but it cannot be inclined; with human beings, it is often the reverse.⁵ Explaining why this is so, is the Galilean challenge.

In the modern era, the challenge, although occasionally expressed, was also widely ignored. Wilhelm von Humboldt is an especially suggestive case to the contrary:

The processes of language must provide for the possibility of producing an undefinable set of phenomena, defined by the conditions imposed upon it by thought. … It must, therefore, make infinite use of finite means… [emphasis added]⁶

The capacity for language is species specific, something shared by humans and unique to them. It is the most striking feature of this curious organism, and a foundation for its remarkable achievements. This is in its full generality the Galilean challenge. The challenge is very real, and should, I think, be recognized as one of the deepest questions in the rich two-thousand-five-hundred-year history of linguistic thought.

Until the twentieth century, there was never much to say about the Galilean challenge beyond a few phrases. There is a good reason why inquiry languished. Intellectual tools were not available for formulating the problem in a way clear enough to be seriously addressed. That changed thanks to the work of Alonzo Church, Kurt Gödel, Emil Post, and Alan Turing, who established the general theory of computability. Their work demonstrated how a finite object like the brain could generate an infinite variety of expressions. It became possible, for the first time, to address part of the Galilean challenge directly, even though the earlier history remained unknown.

With these intellectual tools available, it becomes possible to formulate what we may call the Basic Property of human language. The language faculty of the human brain provides the means to construct a digitally infinite array of structured expressions; each is semantically interpreted as expressing a thought, and each can be externalized by some sensory modality, such as speech. The infinite set of semantically interpreted objects constitutes what has sometimes been called a language of thought.⁷ It is a system that is then capable of linguistic expression, and that enters into reflection, inference, planning, and other mental processes. When externalized, it can be used for social interactions, although this is only a first approximation to what may properly be called thought. There is much more to say about this challenging topic.

There is good reason to suppose that the language faculty is common to the human species. There are no known group differences in language capacity, and individual variation is at the margins. Although speech is the usual form of sensorimotor externalization, we now know that signing, or even touching, does just as well. These are discoveries that require a slight reformulation of the Galilean challenge. A more fundamental qualification has to do with the way the challenge is formulated; traditional formulations were in terms of the production of expressions. In these terms, the challenge overlooked some basic issues. Like perception, production accesses an internal language, but cannot be identified with it. We must distinguish the internalized system of knowledge from the processes that access it. The theory of computability enables us to establish the distinction, which is an important one, familiar in other domains.

In studying human arithmetical competence, we distinguish the internal system of knowledge from the actions that access it. When multiplying numbers in our heads, we depend on many factors beyond our intrinsic knowledge of arithmetic. Constraints of memory are the obvious example. The same is true of language. Production and perception access the internal language, but involve other factors as well, including short-term memory. When the Galilean challenge was addressed in the 1950s, these were matters that began to be studied with some care.

There has been considerable progress in understanding the nature of the internal language, but its free creative use remains a mystery. This should come as no surprise. In a recent review of far simpler cases of voluntary action, neuroscientists Emilio Bizzi and Robert Ajemian remark, in the case of something so simple as raising one’s arm, that

the detail of this complicated process, which critically involves coordinate and variable transformations from spatial movement goals to muscle activations, needs to be elaborated further. Phrased more fancifully, we have some idea as to the intricate design of the puppet and the puppet strings, but we lack insight into the mind of the puppeteer.⁸

The normal creative use of language is an even more dramatic example. This is the unique human capacity that so impressed the founders of modern science.

The fundamental task of inquiry into language is to determine the nature of the Basic Property, and so the genetic endowment that underlies the faculty of language. To the extent that its properties are understood, we can investigate particular internal languages, each an instantiation of the Basic Property, much as each individual visual system is an instantiation of the human faculty of vision. The Biolinguistic Program involves an investigation into how internal languages are acquired and used, and how they function in the human brain. It is also committed to studying the evolution of the language faculty and its basis in human genetics. Universal Grammar is the theory of the genetically-based language faculty; Generative Grammar, the theory of each individual language.

Languages appear to be extremely complex, varying radically from one another. A standard belief among professional linguists sixty years ago was that languages can vary in arbitrary ways; each must be studied without preconceptions. Biologists held similar views about organisms. As recently as 1984, Gunther Stent, in a review of J. M. W. Slack’s From Egg to Embryo, remarked that, “…we cannot expect to discover a general theory of development; rather we are faced with a near infinitude of particulars, which have to be sorted out case by case.”⁹

When understanding is thin, we expect to see extreme variety and complexity.

A great deal has been learned since then. It is now recognized that the variety of life is very limited, so much so that the hypothesis of a universal genome has been seriously advanced.¹⁰ My own feeling is that linguistics has undergone a similar development.

The Basic Property takes language to be a computational system, which we therefore expect to respect general conditions on computational efficiency. A computational system consists of a set of atomic elements, and rules to construct more complex structures from them. In the generation of the language of thought, the atomic elements are word-like, though not exactly words; for each language, the set of these elements is its lexicon.¹¹ Lexical items are commonly regarded as cultural products, varying widely with experience, and linked to extra-mental entities. The latter assumption is expressed in the titles of standard works, such as W. V. O. Quine’s influential study, Word and Object.¹² Closer examination of even the simplest words reveals a very different picture, one that poses many mysteries.

In the analysis of the Basic Property, we are bound to seek the simplest computational procedure consistent with the data of language. Simplicity is implicit in the basic goals of scientific inquiry. It has long been recognized that only simple theories can attain a rich explanatory depth. “Nature never doth that by many things, which may be done by a few,” Galileo remarked, and this maxim has guided the sciences since their modern origins.¹³ It is the task of the scientist to demonstrate this, from the motion of the planets, to an eagle’s flight, to the inner workings of a cell, to the growth of language in the mind of a child. Linguistics seeks the simplest theory for an additional reason: it must face the problem of evolvability. Not a great deal is known about the evolution of modern humans. The few facts that are well established, and others that have recently been coming to light, are rather suggestive. They conform to the conclusion that the language faculty is very simple; it may, perhaps, even be computationally optimal, precisely what is suggested on methodological grounds.

One fact appears to be well established. The faculty of language is a true species property, invariant among human groups, and unique to humans in its essential properties. It follows that there has been little or no evolution of the faculty since human groups separated from one another. Recent genomic studies place this date not very long after the appearance of anatomically modern humans about two hundred thousand years ago.¹⁴ It was at this time that the San group in Africa separated from other groups. There is little evidence of anything like human language, or symbolic behavior altogether, before the emergence of modern humans. That leads us to expect that the faculty of language emerged along with modern humans or not long after, a very brief moment in evolutionary time. It follows, then, that the Basic Property should indeed be very simple. The conclusion conforms to what has been discovered in recent years about the nature of language, a welcome convergence.

Discoveries about the early separation of the San people are highly suggestive. Although Khoisan speakers appear to possess the general human language capacity, their languages are all and only those with phonetic clicks, with corresponding adaptations in the vocal tract. The most likely explanation for these facts, developed in detail by the Dutch linguist Riny Huijbregts, is that their possession of an internal language preceded their separation from other groups; this in turn preceded the externalization of their language.¹⁵ Other groups proceeded in somewhat different ways. Externalization seems to be associated with the first signs of symbolic behavior in the archaeological record. We may be reaching a stage of understanding where the account of language evolution can be fleshed out in ways that were unimaginable until recently.

The genetic endowment for the computational system of language appears to be quite simple. It is a major challenge for research to show how the facts of language are explained in terms of a simple formulation of the Basic Property. Whatever the formulation, it must appeal to the interaction of the Basic Property with specific experiences and language-independent principles. These will, no doubt, include principles of computational efficiency. In this regard, the child’s own experiences have only limited relevance, whether in acquiring the meaning of the simplest words, or the syntactic structures and the semantic properties of the language of thought.

Universal properties of the language faculty came to light as soon as serious efforts were undertaken to construct generative grammars. These included simple properties that had never before been noticed, and that remain quite puzzling. One such property is structure-dependence. The rules that yield the language of thought appeal only to structural properties, ignoring properties of the externalized signal, even such simple properties as linear order. Take, say, the sentence The boy and the girl are here. With marginal exceptions, no one is tempted to say is here, even though the closest local relation is “girl + copula.” The bigram frequency, which measures the likelihood or frequency of word combinations, is far higher for phrases of the form girl is than girl are. Bigram frequency is a common measure used in computational cognitive science and Big Data analysis. Without instruction, we rely on structure not linearity, taking the phrase and not the local noun to determine agreement. Or take the sentence He saw the man with the telescope, which is ambiguous, depending on what we take to be the phrases, although the pronunciation and linear order do not change under either interpretation.

To take a subtler example, consider the ambiguous sentence Birds that fly instinctively swim. The adverb “instinctively” can be associated with the preceding verb (fly instinctively), or the following one (instinctively swim). Suppose now that we extract the adverb from the sentence, forming Instinctively, birds that fly swim. The ambiguity is now resolved. The adverb is interpreted only with the linearly more remote but structurally closer verb swim, not the linearly closer but structurally more remote verb fly. The only possible interpretation—birds swim—is unnatural. That doesn’t matter. The rules apply rigidly, independent of meaning and fact, ignoring the simple computation of linear distance, and keeping to the far more complex computation of structural distance.

The property of structure-dependence holds for all constructions in all languages, and it is, indeed, puzzling.

Furthermore, it is known without relevant evidence, as is clear in cases like the one I just gave, and innumerable others. Experiments show that quite without instruction children understand structure-dependence by about the age of three.¹⁶ We can be quite confident that structure-dependence follows from principles of universal grammar deeply rooted in the human language faculty.

Structure-dependence is one of the few non-trivial properties of language that usage-based approaches to language have sought to accommodate. The attempts have been reviewed in detail elsewhere.¹⁷ All fail. Totally. Few of them even ask the right question. Why does this property hold for all languages, and all constructions? Other cases fare no better.

Other sources support the conclusion that structure-dependence is a true linguistic universal, deeply rooted in language design. Research conducted in Milan a decade ago, initiated by Andrea Moro, showed that invented languages keeping to the principle of structure-dependence elicit normal activation in the language areas of the brain. Simpler systems using linear order in violation of these principles yield diffuse activation, implying that experimental subjects are treating them as a puzzle, not a language.¹⁸ Neil Smith and Ianthi-Maria Tsimpli had found similar results in their investigation of a cognitively deficient but linguistically gifted subject.¹⁹ They also made the interesting observation that normal subjects can solve the problem if it is presented to them as a puzzle, but not if it is presented as a language. If presented as a language, the language faculty, although presumably activated, would be unable to make sense of the data.

Structure-dependence must be an innate property of the language faculty.

Why this should be so? There is only one known answer. It is the answer we seek for general reasons. The computational operations of language are the simplest possible. This is the outcome that we hope to reach on methodological grounds, and that is to be expected in the light of evidence about the evolution of language.

To see why this is the case, consider the simplest recursive operation, embedded in one or another way in all others. This operation takes two objects already constructed, say X and Y, and forms a new object Z, without modifying either X or Y, or adding any further structure to them. Z can be taken to be just the set {X, Y}. In current work, the operation is called Merge. Since Merge imposes no order, the objects constructed, however complex, will be hierarchically structured, but unordered; operations on them will necessarily keep to structural distance, ignoring linear distance. The linguistic operations yielding the language of thought must be structure-dependent, as indeed is the case. An appeal to simplicity appears to answer the question why all human languages must exhibit structure-dependence.

The externalization of language maps internal structures into some sensorimotor modality, usually speech. The sensorimotor system requires linear order; we cannot speak in parallel. But the language of thought keeps to structural relations. Externalization seems to be a peripheral aspect of language. It does not enter into the core function of providing a language of thought. This is at odds with the traditional formulation of the Galilean challenge itself.

Perception yields further evidence in support of this conclusion. The auditory systems of apes are quite similar to those of humans, even attuned to the phonetic features that are used in language. But the shared auditory-perceptual systems leave apes without anything remotely like the human faculty of language. Apes can, of course, gesture, but even with arduous training cannot use their gestural systems with even the most elementary properties of language, though as is now known, sign languages are developed spontaneously by humans even without any linguistic input.²⁰ Similarly, recent research with dogs has found that they are attuned to the phonological and intonational features of human language.²¹ They may even have similar hemispheric specialization, but, of course, that provides them with no step at all towards the acquisition of language. Many such results support the conclusion that our internal language is independent of externalization, and that it evolved quite independently. It is language design that provides the most powerful evidence for this thesis. The linguistic universal of structure-dependence follows from the null hypothesis that the computational system is optimal. It is for this reason indifferent to linear order, which is, of course, the most elementary feature of externalization.

Not long ago it would have seemed absurd to propose that the operations of human language could be reduced to Merge, along with language-independent principles of computational efficiency. Work of the past few years has shown that quite intricate properties of language follow directly from such assumptions.

Displacement is a ubiquitous and puzzling property of language. Phrases are heard in one position but interpreted in two, both in their original position and in some other position that is silent, but grammatically possible. The sentence, “Which book will you read?” means roughly, “For which book x, you will read the book x,” with the nominal phrase book heard in one position but interpreted in two. Displacement is never built into artificial symbolic systems for metamathematics, programming, or other purposes. It had long been assumed to be a peculiar and puzzling imperfection of natural language. Merge automatically yields displacement with copies—in this case, two copies of which book. The correct semantic interpretation follows directly. Far from being an imperfection, displacement with copies is a predicted property of the simplest system. Displacement is, in some respects, even simpler than Merge, since it calls on far more limited computational resources.

The same processes provide intricate semantic interpretations for such properties as referential dependence and quantifier-variable interaction. They also have further implications about the nature of language. Consider the sentence “the boys expect to see each other,” where “each other” refers to the boys, thus obeying an obvious locality condition of referential dependency. Consider now the sentence, “Which girls do the boys expect to see each other?” The phrase “each other” does not refer back to the closest antecedent, “the boys,” as such phrases universally do; rather, it refers back to the more remote antecedent, “which girls.” The sentence means “For which girls the boys expected those girls to see each other?” That is what reaches the mind under Merge-based computation with automatic copies, although not what reaches the ear. What reaches the ear violates the locality condition of referential dependency.

Deletion of the copy in externalization causes processing problems. Such filler-gap problems, as they are called, can become quite severe, and are among the major problems of automatic parsing and perception. In the sentence, for example, “Who do you think ____ left the show early?” the gap marks the place from which the interrogative has been moved, creating a long-distance dependency between the interrogative and the finite verb. If the interrogative copy were not deleted, the problem would be much reduced. Why is it deleted? The principles of efficient computation restrict what is computed to the minimum. At least one copy must appear or there is no evidence that displacement took place at all. In English and languages like English, that copy must be structurally the most prominent one. The result is to leave gaps that must be filled by the hearer. This is a matter that can become quite intricate.

These examples illustrate a significant general phenomenon. Language design appears to maximize computational efficiency, but disregards communicative efficiency. In every known case in which computational and communicative efficiency conflict, communicative efficiency is ignored. These facts run counter to the common belief, often virtual dogma, that communication is the basic function of language. They also further undermine the assumption that human language evolved continuously from animal communication. And they provide further evidence that externalization, which is necessary for communication, is a peripheral aspect of language.

There are methodological and evolutionary reasons to expect that the basic design of language will be quite simple, perhaps even close to optimal. With regard to externalization, the same methodological arguments hold, as they always do, but the evolutionary arguments do not apply. The externalization of language may not, in fact, involve evolution at all. The sensorimotor systems were in place long before the appearance of language. Mapping the language of thought to some sensorimotor system is a hard cognitive problem, one that involves coordinating a computationally efficient internal system and an unrelated sensory modality. The variety, complexity, and easy mutability of observed languages might lie primarily in externalization. It seems increasingly clear that this is the case—something that should be expected. Children know the principles of the internal language without evidence; as, indeed, they know a great deal more about language, including almost all semantic and most syntactic properties. This is a matter of contention, but solidly established, I think.

There is by now some reason to hope that the emerging science of neurolinguistics might identify the brain circuits that underlie the computational system. The neuroscientist Angela Friederici reviews a great deal of promising work in a forthcoming book.²² Publication is scheduled to coincide with the fiftieth anniversary of Eric Lenneberg’s classic study, Biological Foundations of Language.²³ Friederici’s own work leads to some bold and challenging proposals. She provides evidence that a crucial element in linguistic computation is a white matter dorsal fiber tract that connects a specific region in Broca’s area, part of Brodmann area 44, to the posterior temporal cortex. She suggests that this pathway might be “the missing link which has to evolve in order to make the full language capacity possible.” Evidence indicates that this dorsal pathway is very weak in macaques and chimpanzees, and weak and poorly myelinated in newborns, but strong in adult humans with language mastery. The increasing strength of this pathway, Friederici remarks, “correlates directly with the increasing ability to process complex syntactic structures.” A variety of experimental results suggest that “[t]his fiber tract may thus be one of the reasons for the difference in the language ability in human adults compared to the prelinguistic infant and the monkey.” These structures, Friederici suggests, appear to “have evolved to subserve the human capacity to process syntax, which is at the core of the human language faculty.” Quite intriguing discoveries might be forthcoming in these domains.

Let us return to the second component of a computational system, its atomic elements. In the case of language, these will be its lexical items. The conventional view is that these are cultural products, and that the basic ones are associated with extra-mental entities. This representationalist doctrine has been almost universally adopted in the modern period. The doctrine appears to hold for animal communication. Monkey calls are associated with specific physical events. The doctrine is radically false for human language, something recognized in classical Greece.

How can we cross the same river twice, asked Heraclitus? Why are two appearances understood to be two stages of the same river? When we look into the question, puzzles abound. Suppose that the flow of the river has been reversed. It is still the same river. Suppose that what is flowing becomes ninety-five percent arsenic because of discharges from an upstream plant. It is still the same river. The same is true of other quite radical changes.

On the other hand, with very slight changes it will no longer be a river at all. If its sides are lined with fixed barriers and it is used for oil tankers, it is a canal, not a river. If its surface undergoes a slight phase change and is hardened, a line is painted down the middle, and it is used to commute to town, then it is a highway, no longer a river.²⁴ Exploring the matter further, we discover that what counts as a river depends on mental acts and constructions. The same is true of even the most elementary concepts: tree, water, house, person, London, or, in fact, any of the basic words of human language. Human language and thought systematically violate the representationalist doctrine.²⁵

Our intricate knowledge of what even the simplest words mean is acquired virtually without experience. At peak periods of language acquisition, children acquire about a word an hour, often on one presentation.²⁶ The rich meaning of even the most elementary words must be substantially innate.

The evolutionary origin of such concepts is a complete mystery.

The Galilean challenge must be reformulated to distinguish language from speech, and to distinguish production from internal knowledge. Our internal computational system yields a language of thought, a system that might be remarkably simple, conforming to what the evolutionary record suggests. Secondary processes map the structures of language to one or another sensorimotor system for externalization. These processes appear to be the locus of the complexity and variety of linguistic behavior, and its mutability over time.

The origins of computational atoms remain a complete mystery. So does the Cartesian question of how language can be used in its normal creative way, in a manner appropriate to situations, but not caused by them, incited and inclined, but not compelled. The mystery holds for even the simplest forms of voluntary motion.

A great deal has been learned about language since the Biolinguistic Program was initiated. It is fair to say, I think, that more has been learned about the nature of language, and about a very wide variety of typologically different languages, than in the entire two-thousand-five-hundred-year prior history of inquiry into language. New questions have arisen, some quite puzzling. Some surprising answers lead us to revise what has long been believed about language, and mental processes generally. The more we learn, the more we discover what we do not know.

And the more puzzling it often seems.

This essay is based on a talk originally given by the author at the Bibliothèque nationale de France on November 24, 2016.²⁷

The Galilean Challenge

Letters to the Editors

More from this Contributor

More on Linguistics