On Aspects of the Theory of Syntax

Di Sciullo, Anna Maria

This essay is the first part of a series on classic texts that have come to be seen as landmark achievements in their fields.

Noam Chomsky published Aspects of the Theory of Syntax in 1964.¹ The publication of Syntactic Structures in 1957 had already sounded like the roll of distant thunder. A natural language could be studied at the level of explicitness and rigor common in mathematical logic.² A revolution was in prospect.³ Having heard thunder, linguists were eager to see lightening. They were not disappointed. Aspects consolidated the revolution. Old-fashioned linguists and behavioral psychologists were scattered into exile.

In undertaking a revolution, Chomsky did what revolutionaries often do. He created his own predecessors, Plato and René Descartes among them. Reviving the notion of Universal Grammar from the seventeenth-century Port-Royal grammarians, Chomsky argued that since every human child could learn any human language, a single abstract grammatical system must be the common property of the human race. Syntactic Structures had offered linguists a theory in the sense understood by the serious sciences. In Aspects, the offer was carried forward and justified. Writing almost thirty years later, David Pesetsky struck just the right note:

The linguistic capacity of every human being is an intricate system [emphasis added], full of surprises but clearly law-governed [emphasis added], in ways that we can discern by scientific investigation [emphasis added]. Though we still have much to learn about this system, a great deal has been discovered already.⁴

These are ideas that, in Aspects, Chomsky compelled some linguists to accept: that many have accepted them is a measure of the book’s importance.

An Acquisition of the Species

A discussion of human creativity typically proceeds from a handful of examples: Aristotle, William Shakespeare, Isaac Newton, Wolfgang Amadeus Mozart, Albert Einstein. Whatever the list, and no matter its length, it embodies the assumption that human creativity is in short supply. All honor to the geniuses, if only because they are rare. Noam Chomsky’s very greatest contribution to thought has involved turning this assumption on its head. Human creativity is an acquisition of the species, the common property of the human race. By virtue of having mastered a natural language—Pesetsky’s intricate system—every human being is in possession of a rich, complex, and creative system of thought.

In Syntactic Structures, Chomsky identified creativity with the recursive structure of a natural language. The human faculty of language is unbounded in precisely the way that the natural number system is unbounded. It is always possible to extend a sentence, as when the cat is on the mat is enlarged to encompass John believes that the cat is on the mat, and it is possible to do this without obvious limit. In making this possibility the gravamen of his concerns, Chomsky revived Wilhelm von Humboldt’s view that language “must make infinite use of finite means.”⁵ If this is what language does, until the development of the theory of recursive functions in the first four decades of the twentieth century, no one knew how it was done. Chomsky had read and studied the masters: Kurt Gödel, Alan Turing, Alonzo Church, and, above all, Emil Post.⁶ They gave him a theory, and in Syntactic Structures he made use of it.

He was the first linguist to do so.

Following the publication of Syntactic Structures, Chomsky enlarged this idea of linguistic creativity by appealing to his Cartesian camouflage: “[O]ne fundamental contribution of what we have been calling ‘Cartesian linguistics,’” he wrote,

is the observation that human language, in its normal use, is free from the control of independently identifiable external stimuli or internal states and is not restricted to any practical communicative function, in contrast, for example, to the pseudo language of animals.⁷

This is a large and dramatic claim because it assigns to the ordinary use of language an aspect of human freedom. Thoughts and their expression in language are inclined by circumstances but they are not impelled by them: they are free both from “the control of independently identifiable external stimuli” and “internal states.” If this is a claim with overwhelming intuitive plausibility, there is no underestimating its radical nature. It exalts human creativity, but in doing so, places it beyond the scope of the physical sciences as they are now understood. About this kingdom, as Chomsky recognized, modern science has virtually nothing to say.

Competence and Performance

The true and proper object of linguistic theory, Chomsky argued in Aspects, is the competence of a native speaker—what he knows and not what he says.⁸

Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech-community, who knows its language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance.⁹

A speaker’s performance is compromised by limitations of memory, hesitations, repetitions, and any number of throat clearings or verbal tics. The object of linguistic theory is the generative system that accounts for a native speaker’s competence; and not the use of this scheme by systems of parsing and production. This, at once, raised a profound and difficult question: if the performance of a native speaker—what he says—is compromised in various ways, how might he have acquired the underlying system of rules that makes his performance possible? Linguists find the task very difficult, and it is, even today, by no means complete for any natural language. It is hardly possible that children perform a remarkable inductive feat on being presented with data that are compromised and thus degenerate, and under circumstances that are characterized by what Chomsky, with his gift for memorable formulations, called the poverty of the stimulus. Having posed the problem, Chomsky also proposed its solution:

The problem for the linguist, as well as for the child learning the language, is to determine from the data of performance the underlying system of rules that has been mastered by the speaker-hearer and that he puts to use in actual performance … The grammar of a particular language, then, is to be supplemented by a universal grammar that accommodates the creative aspect of language use and expresses the deep-seated regularities which, being universal, are omitted from the grammar itself.¹⁰

The goal of linguistic theory is to provide a theory rich enough to describe any human language by principles general enough to apply to every one of them. Unless such a theory exists, there could be no accounting for the fact that human languages are all learnable.

A generative grammar is a system of rules that assigns structural descriptions to sentences.¹¹ There is no end to sentences and no end to their structural descriptions. The generative grammar represents the linguist’s theory, but it also represents the adult speaker’s tacit linguistic knowledge.

It represents both.

The Standard Theory

Aspects presented linguists with what, at once, became the Standard Theory. Syntactic Structures had already offered the essentials. A grammar of a natural language comprises phrase structure and transformational rules. Phrase structure rules break sentences into constituents, the process ultimately yielding a terminal string in which constituents no longer contain constituents. These rules generate hierarchical structures or phrase markers—tree diagrams, in fact. Transformational rules, on the other hand, map phrase markers onto phrase markers. Transformational rules had, in fact, been introduced by Chomsky’s mentor, Zellig Harris, but in Syntactic Structures they were, for the first time, embedded in a purely formal context.

In Aspects, the ideas found first in Syntactic Structures found themselves amplified. The Standard Theory is a computational system. Rules are formal because they are explicitly specified: there is no appeal to meaning. The grammar consists of syntactic, semantic, and phonological components; and in addition it contains, or makes use of, a lexicon, something like a formal dictionary.¹² Syntax is under the control of phrase structure and transformational rules. Phrase-structure rules are formulated as context-free rewriting rules.¹³ A category symbol A, where A might designate S (for sentence), is dissected into a string Z of one or more symbols: A → Z/ X_Y, where the context afforded by X and Y is null. The symbols themselves may represent lexical categories, such as noun (N) or verb (V); syntactic categories such as sentence (S); and syntactic constituents such as noun and verb phrases (NP and VP).

The grammar also contains context-sensitive rules: A → Z/ X_Y, where X or Y are not null. These rules serve to insert lexical items into phrase markers.¹⁴ It matters a great deal where they are inserted. Morris plays lapta is fine: not so Lapta plays Morris. The appeal to context is ineliminable. Context-free and context-sensitive rules generate the phrase marker underlying sentences: [_S [_NP [N]] [_VP V [_NP [N]]] is an example drawn down to the level of syntactic categories; and on lexical insertion, there is [_S [_NP Morris] [_VP plays [_NP lapta]]. From these phrase markers, it is possible to recover old-fashioned grammatical functions—the fact that Morris is the subject of the sentence in which he is playing lapta. Functions are treated as two-place relations: x is the subject of y. These functional relationships may be seen in plain sight on the phrase marker itself, with one node marking the subject of a sentence, and another, its object. The result is what Aspects, in a phrase now famous, called deep structure. Transformational rules then map deep structures onto surface structures—those structures ready to enter the gabble of communication.

Chomsky electrified the community of linguists by persuasively arguing that the surface structures of a natural language are no good guide to its deep structures and, indeed, the distinction between deep and surface structures was widely appreciated as one of the theory’s greatest insights. In insisting on the distinction, and its importance, Chomsky appealed to brilliantly chosen examples. In Syntactic Structures, he had introduced the now famous sentence Colorless green ideas sleep furiously in order to demonstrate that there exist perfectly grammatical English sentences that don’t mean a thing. It followed that syntax and semantics were independent; a large conclusion derived from a small example. In Aspects, examples multiplied. The sentences John is easy to please and John is eager to please are on their surface very similar, differing as they do in only one word and otherwise conforming to the same grammatical pattern: NP Cop Adj to VP. Appearances are misleading. These sentences are not at all similar. From John is easy to please it follows that it is easy to please John, but nothing like this follows from John is eager to please. On the other hand, John’s eagerness to please follows from the fact that John is eager to please, but there is nothing like John’s easiness to please, even though it is easy to please John. These two sentences are radically different. It is on the level of deep structure that these differences are evident. In arguing in this way with respect to a great many examples, Chomsky was making specific points, but he was also doing more. He was introducing linguists to a new style of argument.

Recursion Redux

Recursion figured prominently in Syntactic Structures. Syntactic rules can refer back to themselves and thus may apply to their own outputs. In Aspects, sentences themselves became objects of recursive looping and replaced certain transformational rules. This was a major technical development. A sentence (S) may be dissected into a noun phrase and a verb phrase

S → NP VP.

Well and good. A noun phrase may now be dissection into a noun phrase and a sentence

NP → NP S.

A verb phrase may then be dissected into a verb and a noun phrase

VP → V NP.

And in view of NP → NP S, into a verb, a noun phrase, and a sentence in virtue of NP → NP S. This makes possible the generation of structures such as

[_SJohn [_Swho met Mary] knows Sue],

as well as

[_S the linguist [_S that met the mathematician
[_S that knows the student [_S that… .].

The introduction of sentential recursion, with S hanging on for dear life from both sides of a phrase marker, introduced a notable economy into the Standard Theory. Syntactic Structures had handled the matter by hand, inserting sentential phrasemakers in other sentential phrase markers. Fewer symbols were now required, the derivation of complex clauses simplified, the theory streamlined.

With recursion, there is in Aspects, a return to the creativity of language:

The infinite generative capacity of the grammar arises from a particular formal property of these categorical rules, namely that they may introduce the initial symbol S into a line of a derivation. In this way, the rewriting rules can, in effect, insert base Phrase-markers in other base Phrase-markers, this process being iterable without limit.¹⁵

Ineliminable Transformations

The Standard Theory offered linguists a formal structure with two quite different kinds of formal rules—phrase-structure and transformational. Recursion got rid of some transformations, but not all. The resulting structure is, if not inelegant, then, at least, somewhat clumsy. Why two? Empirical justifications for transformational rules arose from the mismatches between deep and surface structures. The passive voice is an example. In a passive sentence, the logical object of a verbal predicate occurs in the subject position. John was convinced by Bill to leave consists of two sentences

(S): [_S John was convinced by Bill [_S _ to leave]].

John is the grammatical subject of the main sentence, but not its logical subject, which is Bill. On the other hand, Bill is not the logical subject of the embedded sentence, which is John.

Transformational rules apply from the embedded constituent of a sentence to its outermost constituent. They can insert, erase, substitute, and reorder linguistic constituents. The passive transformation is again an example:

NP₁ V NP₂ ⇒ NP₂ be + V-ed by + NP₁.

This transformation applies to a phrase maker consisting of a nominal constituent NP₁ followed by a verb (V), itself followed by second distinct nominal constituent NP₂. The transformation specifies the result of this operation: NP₁ and NP₂ are reordered, the auxiliary be is added to V as well as the passive morphology –ed, and the preposition by is added to the postposed NP₁.

None of this can be handled by phrase structure rules, unless the phrase-structure rules are themselves allowed to increase without limit. If transformational rules are ineliminable within the context of phrase-structure grammars, they seemed, nevertheless, to carry just something of the arbitrary. It is therefore one of the ironies of intellectual history that, far from being purged in theoretical syntax, it has been the other way around, with phrase structure rules themselves dwindling in favor of transformational rules in the minimalist program.

An Old-Fashioned

Beyond its obvious contribution to syntactical theory, Aspects offered linguists a rich and subtle analysis of old-fashioned grammatical categories—noun, verb, adjective, adverb, and the like. Although obviously answering to something, these categories were never clearly defined. A noun was traditionally defined as an expression designating a person, place, or thing. The definition is obviously inadequate. In the sentence Luck is a great virtue, “luck” is a noun but not one designating a person, place, or thing. There are many other examples. Making use of a technique first introduced by Roman Jakobson, Chomsky purged these didactic definitions in favor of a scheme in which each syntactic category was flagged by a finite set of binary-valued features. The word dog thus enters the lexicon marked as [+N] for noun; the word barks by [+V] for verb. Neither [+N] nor [+V] receives any further definition, but they do determine how lexical categories behave.¹⁶ Their meaning is in their use, as Ludwig Wittgenstein remarked, and their use is governed by their rules, the rules in turn governed by their features. These features serve to discriminate transitive verbs such as frighten from intransitive verbs such as sleep. Both frighten and sleep are specified with an inherent [+V] feature: they are both verbs; but frighten, contrary to sleep, is specified by a trailing [+N]. It takes an object. The introduction of categorical selection rules—what goes where—ensures that verbs like frighten are inserted in a phrase marker in the context of a nominal constituent ([+N]), while verbs like think are not. The professor frightens the boy is grammatical. The professor thinks the boy is not.

Chomsky also proposed to distinguish between categorical and semantic selectional features. A verb like frighten requires a [+ animate] object; not so, a verb such as praise. The sentence The professor frightens sincerity is grammatical, even though it is semantically deviant, whereas The professor praises sincerity is grammatical and otherwise just fine.¹⁷ The introduction of contextual selection rules ensures that frighten is inserted in the phrase marker in the structural context of a [+N] [+animate] object.

In developing his theory of syntactic features, Chomsky was heeding methodological constraints: he was responding to the imperative to keep his theory simple. Context-sensitive rules could well be used to settle the distinctions between frighten and sleep, but only by adding complexity to the grammar. The introduction of syntactic features is one of the most important contributions of Aspects.¹⁸ It leads to one of Chomsky’s boldest and most dramatic conclusions. The lexicon of a natural language, with its constituents flagged by various syntactic, semantic, and phonological features, is the very place where one language is unlike another. Beyond the lexicon, every human language is governed by the same structures of universal grammar, and in this sense, Chomsky argued, there is only one human language.

One human language! This is surely among the most provocative and dramatic claims of the last half century.

First Principles

Linguistic theory aims to derive linguistic facts from first principles, an ultimate goal linguistics shares with science. What would these principles be for language? We point to one universal principle stemming from the Standard Theory: the structure dependency of syntactic rules. Thus S goes over to NP and VP. NP and VP are sister nodes, both structural dependents of S. Ditto for NP → Det N and VP → V NP. The top-down application of the rewriting rules generates structural dependencies between syntactic constituents. The rule governing relative clauses rewrites an NP into an [_NP NP S]. This rule ignores the linear position of the NP. Relative clauses can be generated both in subject position [_S The student of physics [_Swho met your advisor] is in my class], and object position, e.g. [I know the student of physics [_S who met your advisor]]. A relative clause modifies an NP and not the embedded nominal constituent within that NP. The relative clause [_S who met your advisor] does not modify the nominal constituent [physics], even though this nominal constituent immediately precedes it. Structural dependency is a first principle of the language faculty. Linear order is not.

Transformational rules, as defined in the Standard Theory, are structure dependent and they apply to the structural description of a sentence, specify the structural changes, and derive the resulting transformed structure. Transformations may also be associated with conditions on their application. For example, certain transformations apply to main clauses but not to embedded clauses. This is the case for closed yes or no questions. This transformation applies to the underlying structure of sentences such as [_S John is here] and yields the underlying structure [_SIs John here]. Even though these examples seem to indicate that this transformation relies on surface linearity, inverting the auxiliary and the immediately preceding nominal constituent, the following example includes a more complex subject: [_S[_NP The professor of John] is here], and illustrates that this transformation is in fact structure dependent. If it were not the case, this transformation could apply to the auxiliary and the immediately preceding nominal constituent John, yielding [[The professor of is John] here]. Instead, this transformation applies to the full NP structure and yields [_S Is [_NP the professor of John] here]. It might very well be the case that structure dependency of syntax is rooted in language design and so a first principle of the language faculty.

Why? No one knows.

Open Questions

Aspects left open several questions for further inquiry. Alternative hypotheses are considered in Aspects, including with respect to the relevant levels of representation, the properties of the syntactic rules, and the principles of Universal Grammar.¹⁹ These questions have been investigated in the course of the development of generative grammar. The discovery that syntactic rules apply across categories led to the elimination of the multiple rewriting rules postulated in Aspects, in favor of a general rule schemata in Government and Binding theory. Transformational rules were reduced to two general operations: move NP (displacing nominal constituents), and move wh- (displacing operators such as who, what, where, when, in open question formation). In the minimalist program,²⁰ syntactic operations are reduced to Merge (x, y), where x and y are two syntactic objects. Current work investigates the consequences of distinguishing Set Merge, a symmetrical operation deriving unordered sets of constituents, from Pair Merge, an asymmetrical operation deriving ordered sets of constituents.

Another interesting question left open in Aspects is whether syntactic rules yield the linear order of syntactic constituents, as in the Standard Theory, where John eats flies, or whether they leave the constituents they combine unordered, as in the set {John, eats, flies}, which is on set-theoretical grounds identical to {eats, John, flies}. The minimalist program investigates, and, indeed, champions the second hypothesis. The linearization of syntactic constituents is handled by the phonological component of the grammar. The very deepest operations of the human mind are indifferent to what might appear to be the most fundamental fact about human language—that words follow one another in a particular order. In all of these arguments, a greater, grander argument is always at work. Universal Grammar must account for the rapid emergence of language in the species, and it must account for its rapid acquisition in the individual. Nothing less than radical simplicity can serve either goal.

Influence Beyond Linguistics

By defining the object of inquiry of linguistic theory as internal to the mind, linguistic theory led to the creation of a new interdisciplinary field of inquiry devoted to the study of the biological basis of language, the so-called Biolinguistic Program.²¹ Recent research confirms the importance of generative grammar for an understanding of the language faculty as a specifically human trait.²² The language faculty, like other biological systems, is genetically rooted. Under normal conditions, it develops very early in the child without conscious efforts or extensive training. Animals cannot learn a human language, much to their regret and ours. Monkeys can spontaneously master the weakest of finite-state grammars, but they cannot reach the context-free grammars, which are characteristic of human language, and hierarchical structures are, for this reason, beyond them.²³

Nothing in the neurosciences is yet as subtle and detailed as the Standard Theory, but it has been established that Broca’s area supports the processing of syntax. Human beings are programmed to compute linguistic recursion. A part of Broca’s area would appear dedicated to complex syntactic structures: Brodmann area 44 is activated for center embeddings, and Brodmann area 45, adapted to movement.²⁴ Other studies in cognitive neurosciences indicate that the human brain is sensitive to structure-dependent computation when processing language. This is the case for sentence processing as well as for the processing of phrasal constituents.²⁵ Yet other studies indicate that the brain processes deep structures, largely ignoring their surface form.²⁶ “Linguistic theory is mentalistic,” Chomsky wrote somewhat defiantly, “since it is concerned with discovering a mental reality underlying actual behavior.”²⁷ Linguistic theory is still mentalistic, but step-by-step, research is uncovering its physical roots in the neurophysiology of the human brain.

An Enduring Legacy

Aspects introduced a revolution within linguistics. The subject has never been the same again. It promoted linguistics into a science, one that accepted the methods and the standards of the serious sciences themselves. It did more. It championed an integrated study of organic systems, an interdisciplinary field of inquiry bridging results from linguistics and other sciences. And it did still more. It achieved what only the most profound of scientific revolutions achieves and that is transformation of what initially seemed outrageous to what currently seems commonplace. Children do learn their native language without effort or instruction; a human language is a system of dazzling and poorly understood complexity; some things must be innate if anything is ever to be acquired; there is a distinction between competence and performance; the most robust system of assessment in studying grammar is a native speaker’s intuitions; and the ability of every human being to use his language for creative means is a mystery that we have not penetrated and may never understand.²⁸