In response to “The Recovery of Case” (Vol. 2, No. 3).

To the editors:

David Berlinski and Juan Uriagereka’s entertaining and informative celebration of Jean-Roger Vergnaud’s famous (in linguistics) letter to Noam Chomsky and Howard Lasnik about their 1976 manuscript of “Filters and Control” attempts to provide the general context in which the letter was such an important contribution to syntactic analysis and linguistic theory.1 Their essay, which is a very welcome salute to both Vergnaud’s genius and a defining event in the history of linguistics, elucidates the letter primarily in terms of the syntactic theory of the period. The following remarks attempt to place Vergnaud’s letter more centrally within the history of modern generative grammar from its inception to the present, especially how the ideas in the letter continue to inform the theoretical concerns of today—in particular the minimalist program for linguistic theory (which Berlinski and Uriagereka’s essay mentions only once)—and how these ideas are a reflection of the Galilean style in the natural sciences, which will be discussed at the end of these remarks.

The generative enterprise that Chomsky founded in 1955 is now 61 years old. Vergnaud’s 1977 letter appears in the 22nd year. Today we are almost 4 decades beyond Vergnaud’s letter, exploring a minimalist program for linguistic theory that was originally proposed in Chomsky’s 1992 manuscript “A Minimalist Program for Linguistic Theory” (published the following year, 16 years after Vergnaud’s letter).2 This research program, which is now almost a quarter century old, has significantly changed our understanding of what language is.

Whenever someone writes about language (and/or particular languages), there is an implicit assumption that the writer and the reader understand the terms language as opposed to a language—and this is true of the Berlinski and Uriagereka essay as well. However, as Chomsky notes in the beginning of the first chapter of What Kind of Creatures Are We?, language “has been studied productively for 2,500 years, but with no clear answer to the question of what language is,” the question that is the subject and title of that chapter.3

While explicating the abstract noun language remains a difficult project for everyone, an explicit definition of the countable noun language has been proposed within the generative enterprise: simply, a language is a lexicon plus a computational system in the mind of the speaker.4 This definition is unavoidable: every speaker of a language must know a lexicon and some process for combining lexical items into the linguistic expressions of the language.5 Under this definition, a language constitutes a system of knowledge which is internal in the brain/mind of the speaker, what Chomsky in Knowledge of Language designates as an I-language, where I stands for internal and individual.6 This rejects the notion of “an ideal speaker-listener” characterized in the quote from Chomsky 1965 that Berlinski and Uriagereka cite, a notion that presumably identifies no real speaker-listener who exists in the world and therefore conflicts with the concept of I-language, which identifies a phenomenon that exists in the world.7

The concept of I-language does not provide a definition of the common sense notion of language whereby we talk about the English language as a unified phenomenon that exists in the world, abstracting away from the obvious variation that exists across individuals in geographically diverse locations or in socially diverse contexts, as well as the idiosyncracies that may occur among speakers in the same geographical location and social context. Nonetheless this usage is unproblematic, on a par with our continued talk about the sun rising and setting rather than the earth rotating on its axis.8

The system of knowledge that defines a language is also called a grammar of the language—thus consisting of the syntactic atoms that form linguistic expressions in the language (the items of the lexicon) and the operations that compose lexical items into expressions that have a phonetic form (sound) and an interpretation (meaning). The interpretation of linguistic expressions is determined both by the interpretation of their individual lexical items and by the syntactic structure in which these lexical items are constructed. Simple examples can provide powerful demonstrations of this. Consider the phrase exceptional students and teachers, which has two equally possible interpretations. On one, only students are exceptional, whereas on the other both students and teachers are exceptional. Under both interpretations, the meaning of the individual words remains the same. The difference in interpretation depends on the different covert syntactic structures that can be assigned by the grammar to the same linear string of words.

This becomes clearer when we consider that the unambiguous phrase teachers and exceptional students is synonymous with the first interpretation. This second phrase coordinates two expressions teachers and exceptional students, where it is clear that the second is a subpart of the whole in which the adjective modifies only students. This can be represented graphically in terms of a “tree” diagram (1), where the grouping of elements represents the hierarchical structure of the expression.

The phrase exceptional students and teachers on the interpretation that is synonymous with teachers and exceptional students has the identical hierarchical structure (demonstrated in (2)), while the two phrases have different linear orders of their lexical elements.

On the interpretation of the first phrase where both students and teachers are exceptional, this unique interpretation corresponds to a different hierarchical structure represented as (3).

In (3), exceptional is interpreted as modifying the coordinate structure students and teacher, hence modifying both conjuncts. The same two-way ambiguity occurs when a possessive pronoun (e.g. our) replaces the adjective exceptional, as in our students and teachers.

However, combining the possessive pronoun with the adjective does not yield a four-way ambiguity, but rather a three-way ambiguity. Thus our exceptional students and teachers can be paraphrased unambiguously as i) our exceptional students and our exceptional teachers, ii) teachers and our exceptional students, iii) exceptional teachers and our exceptional students, but not iv) exceptional teachers and our exceptional students—even though there is nothing ill-formed about the paraphrase itself. The fact that the ambiguity is three-way and not four-way follows from hierarchical structure. Once exceptional combines with the phrase students and teachers and thereby modifies both, there is no way that our can combine with the phrase exceptional students and teachers and not also modify both students and teachers.

The task of linguistic theory is to determine what grammatical operations account for this crucial but hidden structure. Thus a major focus of the generative enterprise since its inception has been to provide a theory of the computational system for human language that accounts for the observable phenomena of language, including ambiguities of interpretation that can be linked directly to differences in covert syntactic structure.

At the beginning of the generative enterprise, grammatical operations of the computational system were referred to as rules, designated as “descriptive” to distinguish them from commonly known prescriptive rules of grammatical usage such as “don’t end a sentence with a preposition.” The operations consisted of two distinct kinds, phrase structure rules and transformations. The former applied first to generate labeled hierarchical structures for linguistic expressions, which were then operated on by transformations to produce the surface forms of these expressions. The phrase structure rules operate top-down, starting with an initial phrasal category symbol that is expanded into its constituent parts, and those parts into their constituent parts, and so on until this phrase structure derivation produces the lexical items contained in the linguistic expression under analysis.

The initial formulations of both phrase structure rules and transformations were, to use the phrase Berlinski and Uriagereka attribute to “the structure of the English language,” devilishly complex because they required not only stipulations about the order in which these operations apply, but also stipulations about whether operations applied optionally or obligatorily as well as further stipulations about the contexts in which these operations could or must apply. However, as became clear with the formulation of the Extended Standard Theory (EST), this devilish complexity turned out to be a property of the particular formulation of early generative grammars, not necessarily a property of the language itself.9

Under the formulation of the EST that “Filters and Control” adopts, rules like the Passive Transformation in the earlier work (e.g. Chomsky’s Syntactic Structures, Aspects of the Theory of Syntax) are reduced to just a single elementary operation that lies at the heart of the rule.10 Compare, for example, the formulation of a passive transformation in Syntactic Structures to the wonderful simplicity of Chomsky and Lasnik’s Move NP (first proposed in Chomsky 1976).11

  1. Passive — optional:

    Structural analysis: NP – Aux – V – NP

    Structural change:

    X1X2X3X4X4X2 + be + enX3by + X1

The formulation in (4) performs two “movement” operations, one on the underlying subject NP (the term X1) and another on the underlying object NP (the term X4)—and in addition two insertions of lexical material (the passive auxiliary be attached to the passive participle affix -en, which will eventually be affixed to the main verb, and the agentive preposition by). The purpose of the rule is to provide an explicit account for the fact that the surface syntactic subject of a passive construction is interpreted as the underlying object of the passive verb—what is now called displacement, a phenomenon in language where a syntactic unit is interpreted as occupying a different syntactic position from the one in which it is pronounced.

The reduction to Move NP under the EST was motivated by the formulation of a set of general constraints on the operation of transformations (see Chomsky 1973) that were proposed as substantive universals, forming part of the initial state of the language faculty that is common to all humans—what is called Universal Grammar.12 Given these constraints, grammatical transformations could be stated in the simplest form as a single elementary operation without reference to any specific syntactic context or other condition on applicability. A comparison of the EST as formulated in Chomsky and Lasnik with earlier formulations of generative grammar suggests a different conclusion from Berlinski and Uriagereka’s claim that “no one would think to say that the EST was wonderfully elegant.” The EST was clearly a more elegant theory—wonderfully so in its context—than its predecessors, though not as elegant as what has evolved under the minimalist program (as will be discussed below).

The proposals in “Filters and Control” are in a sense an addendum to the EST that attempts to account for some remaining details of the syntax of subordinate clauses in English no longer accounted for under the simplification of transformational rules adopted, including the general rule of free deletion, which Berlinski and Uriagereka discuss (see footnote 15). Nonetheless, this addendum makes the theoretically important proposal that the computational system includes not only grammatical operations and conditions on their application (conditions on derivations), but also filters—that is, conditions on the output of these operations, what are more generally called conditions on representations.13 The former conditions were formulated as general principles of the language faculty, thus part of UG; whereas some of the filters in “Filters and Control” are formulated as constraints specific to English or even varieties of English. For example, the *[for–to] filter that rules out examples like *we want for to win (“Filters and Control” (40b)) does not apply to the variety of English spoken in the Ozark mountains and Ottawa Valley where for–to constructions are acceptable.

Vergnaud’s letter is primarily concerned with the Chomsky and Lasnik filter (93) that applies to infinitival phrases of the form NP-to-VP. When this construction is preceded by for, a subordinating particle for infinitival clauses (on a par with that, which is the subordinating particle for finite subordinate clauses (e.g. John thinks that Mary is lucky)), the resulting sentences are well-formed, as Berlinksi and Uriagereka show in their nine examples in (36). But when this infinitival subordinating particle is missing—presumably the result of free deletion, the resulting sentences are all deviant (as indicated by the asterisks). For example, their (36b) [= “Filters and Control” (89.b.i)] abbreviates two distinct sentences (where the NP-to-VP structure is underlined): it is illegal for Bill to take part, which is well-formed, and *it is illegal Bill to take part, which is not. The deviant sentence is ruled out by filter (93).14

  1. *[αNP to VP] unless α is adjacent to and in the domain of a verb or for.

The adjacency requirement is motivated by the following paradigm with the NP-to-VP constructions underlined, which includes Berlinski and Uriagereka’s (43) and (44).

    1. I want Bill to win.
    2. I want for Bill to win.
    3. *I want very much Bill to win.
    4. I want very much for Bill to win.

If the adverbial phrase very much separates the verb want from the NP-to-VP structure, then the subordinating particle for must occur adjacent to this structure or the filter (93) marks the resulting structure (5c) as deviant.15

Vergnaud’s letter says obliquely that filter (93) does not seem like a good candidate for a general principle of grammar. One problem in this regard is that (93) mentions a specific English lexical item, the infinitival to. Vergnaud avoids this problem by proposing instead that “this filter could be replaced by a filter that governs the distribution of certain kinds of NPs,” thus eliminating the reference to the to-VP portion of (93).16 What he notices is that when NP-to-VP constructions are legitimate in English, a pronominal subject will always occur in the same morphological form as the corresponding pronoun that occurs as the object of a verb or preposition. He designates this morphological form as the “Governed Case” (e.g. him), in contrast to the Subject Case (e.g. he, otherwise known as Nominative Case) and the Genitive Case (e.g. his). He then proposes a filter on the distribution of NPs in the Governed Case, (2) in his letter.17

  1. A structure of the form … [α … NP … ] … , where NP is in the Governed Case and α is the first branching node above NP, is ungrammatical unless (i) α is the domain of [− N] or (ii) α is adjacent to and in the domain of [− N].

Vergnaud’s filter generalizes to all NPs in the Governed Case and more importantly generalizes to all languages—a brilliant move that shifted the focus of the discussion about filters from a set of facts about constructions in a particular language (English) to plausible general properties of all languages.

Central to Vergnaud’s formulation are two general concepts of linguistic structure: Case, which in some languages correlates with distinct morphological and phonetic forms of lexical items that occur in distinct syntactic positions, and Government, which is construed as a syntactic relation between an element that governs and other syntactic units that are governed. Both concepts are plausible candidates for incorporation into general principles, and immediately following Vergnaud’s letter they became the focus of a widespread, intense, and fruitful effort to integrate them in a unified set of general principles of grammar that applied across languages to a wide variety of constructions beyond those found in English. Beginning with “On Binding” (written in 1978), which refined and generalized Vergnaud’s Case filter to include all lexical NPs, this led to a general theory of UG principles which were unified via the concept of government, developed in Chomsky’s Lectures on Government and Binding (written in 1979-80), the so-called GB theory under the Principles and Parameters framework.18 Vergnaud himself was an early contributor to this effort in a 1980 article co-authored with Alain Rouveret, where they propose a condition for anaphor binding in which a notion of “minimal binding domain” is formulated in terms of Case that is itself assigned on the basis of government relations.19 And although the concept of government is generally abandoned in the formulation of the minimalist program (Chomsky 1993), it nonetheless led to the discovery of a wide range of syntactic phenomena, some that remain to be accounted for under current theorizing.20

Berlinksi and Uriagereka credit Vergnaud’s Governed Case filter with “bringing Chomsky and Lasnik’s various and vagrant filters under the umbrella of a single governing concept”, which is simply false. While Chomsky’s reformulations in “On Binding” (1980a) attempt to replace one further Chomsky and Lasnik filter with a principle based on Case (the so-called that-trace filter), they do not subsume all the others.

The stunning insight in Vergnaud’s analysis is the assumption, contrary to what is observable in phonetic form, that all lexical NPs in English have Case features even though only the personal pronouns actually manifest these features morphologically and phonetically. “On Binding” articulates this assumption, which is implicit in Vergnaud’s letter, as “Suppose we think of Case as an abstract marking associated with certain constructions, a property that rarely has phonetic effects in English but must be assigned to every lexical NP.”21

The reasoning that leads to Vergnaud’s fundamental insight is an example of the “Galilean style” of the natural sciences, which in physics involves “making abstract mathematical models of the universe to which at least physicists give a higher degree of reality than they accord the ordinary world of sensation.”22 In linguistics this translates as “a readiness to undertake perhaps far-reaching idealization and to construct abstract models that are accorded more significance than the ordinary world of sensation, and correspondingly, by a readiness to tolerate unexplained phenomena or even as yet unexplained counterevidence to theoretical constructions that have achieved a certain degree of explanatory depth in some limited domain.”23 One goal of this undertaking is “to explain complex visibles by means of the simple invisible,” which Chomsky characterizes as reduction and calls “the essential art of science”.24

In Vergnaud’s case, the postulation of abstract Case, a simple invisible for English and all other languages, provides an explanatory basis for the distribution of NP-to-VP constructions in English. “On Binding” generalizes Vergnaud’s Governed Case filter to all NPs and Lectures on Government and Binding utilizes this generalized Case filter to explain why displacement in passive constructions is obligatory—that is why it is not possible to have a passive construction in which the semantic object of the passive verb occurs in the syntactic object position (e.g. *it was refuted the argument, where it is interpreted as a nonreferential pleonastic element).25 Chomsky hypothesizes that passive verbs generally lack the ability to mark a syntactic object with Case. It follows that a syntactic object in a passive construction would be unmarked for abstract Case in violation of the Case filter. When the semantic object of a passive verb is moved to the subject position of a finite clause, it is marked for abstract Case in that position and thereby satisfies the Case filter. In this way abstract Case provides an explanatory basis for why Move NP must apply in the derivation of passive constructions. Chomsky 1981 shows how this analysis generalizes to other cases of Move NP.26

Berlinski and Uriagereka state that “Case and case have now become entrenched within modern linguistic theory, so much so that various displacement operations could not be stated without them.” However, the displacement operation Move NP, generalized to Move a in “On Binding,” has never been formulated with reference to overt morphological case or covert abstract Case.27 The whole point of postulating conditions on derivations and representations independently of the formulation of operations in the computational system is that this allows for a maximally simple formulation of these operations.

Berlinski and Uriagereka also claim that their analysis of passive constructions, which posits among other things “the empty category e” in the underlying subject position, “has remained virtually unchanged within the minimalist program”—the only reference to the minimalist program in their essay. However, the device of underlying empty categories has been dropped in minimalist syntax because it is not needed, as will be demonstrated below.

The minimalist program for linguistic theory since its inception in 1993 has been organized around two research questions: 1) to what extent is the computational system for human language “optimal” (in some sense that can be made explicit)? and 2) to what extent is human language, the initial state of the language faculty that is universal across the species and the I-languages that it allows humans to acquire, a “perfect” system that interacts with other components of the human mind/brain? Not much progress has been made with answering the second more opaque question, but a great deal has been made in answering the first.

The minimalist program has reduced what originally started out as a complicated and messy apparatus of highly articulated phrase structure rules plus highly articulated transformations to a single minimal recursive operation, Merge, which builds phrase structure bottom-up starting with a pair of lexical items.28 For example, the derivation of the argument was refuted would begin with the merger of the and argument into a syntactic unit the argument. The derivation would proceed with the merger of that syntactic unit and the passive participle refuted, forming a new syntactic unit refuted the argument in which the argument is interpreted as the object of the predicate refuted. At this point, the verb phrase refuted the argument merges with the finite passive auxiliary was, creating the syntactic phrase was refuted the argument. In each of these applications of Merge, the two syntactic units that undergo the operation are separate units before the operation applies.

In contrast, the final application of Merge in this derivation merges the syntactic unit the argument contained in the syntactic unit was refuted the argument with the larger unit, creating the syntactic object with the hierarchical structure (6).

  1. [[the argument] [was [refuted [the argument]]]]

In (6) the single syntactic unit the argument occurs in two contexts: the object of the verb refuted where it is interpreted as an argument of the predicate but not pronounced, and the syntactic subject of the passive sentence, where it is pronounced but not interpreted as the semantic subject of the verb, the agent of the action.29 This application of Merge creates the syntactic subject position in the same way that merger of the philosopher with has refuted the argument creates the syntactic subject position in the philosopher has refuted the argument. The derivation of the passive construction the argument was refuted involves no special empty category in an underlying subject position, a syntactic element that plays no role in the interpretation, or pronunciation, or derivation of linguistic expressions.

The generalization of binary Merge to syntactic units where one is contained inside the other (referred to as “internal” Merge to distinguish the application where the two syntactic objects merged are separate) is the null hypothesis. As Chomsky has remarked, “it is hard to think of a simpler approach than allowing internal Merge (a grammatical transformation), an operation that is freely available” and that would require a special and unmotivated constraint to block its application.30 In this way, the creation of hierarchical phrase structure and displacement are unified under a single minimally simple operation.31

Merge also performs lexical insertion in derivations, thereby eliminating the need for a special transformational rule that inserts an item of the lexicon into the derivation of a linguistic expression.32 The formulation of Merge results in a massive simplification of the grammatical operations needed for the computational system of human language, one that suggests an explicit positive answer to the first minimalist question: the computational system of human language may be optimal in the sense that it involves a very few optimally simple grammatical operations.

The initial formulation of a minimalist program casts the abstract Case filter as an “interface condition,” which eleven years later is formulated as a more general condition: “the information in expressions generated by [a language –RF] L must be accessible to other systems, including the sensorimotor (S-M) and conceptual-intentional (C-I) systems that enter into thought and action,” which Chomsky designates as a necessary condition on language design “if language is to be usable at all.”33 Exactly how this applies to an abstract Case filter remains to be spelled out.34

Nonetheless, the concept of abstract Case remains central to current syntactic theory and analysis under the minimalist program, as illustrated in one recent and another relatively recent article, both of which utilize the concept in the analysis of languages that are quite different from English and other Indo-European languages. Based in part on an analysis of ergative-absolutive languages, Julie Legate concludes that “abstract Case and agreement relationships are established in the syntax and realized in the morphology, each language’s realization being as faithful as its morphological resources allow.”35 In an analysis of Neo-Aramaic, another non-Indo European language, Laura Kalin proposes that “Case and agreement are thus two sides of one nominal licensing process, with Case licensing following from φ-agreement.”36

Jean-Roger Vergnaud’s discovery of abstract Case in 1977 remains a shining moment in the two and a half millennia history of the study of language, when applying the Galilean style in this domain allowed Vergnaud to grasp an essential piece of the hidden structure of human language.37

Robert Freidin

Juan Uriagereka replies:

A couple of reactions to Bob Freidin’s letter. Short one: sure, many of the precisions he raises are well-taken. Some are a bit more nitpicky than others, which I will comment on shortly, but the idea that we have cut corners in our piece is certainly correct. Too many? Too sharp? Distorting the enterprise? I will leave that for readers to decide, particularly after reading Bob’s own contribution. There is, however, one thing I do want to comment on since I believe it is misguided: taking the minimalist program as anything but a collection of ideas that, to date, has not crystalized into a theory. Chomsky himself has been rather careful in emphasizing this much: minimalism is a desideratum, which may well be wrong or even wrong-headed in the end. (I happen to think it is right on the mark, but it certainly does not refer to anything as narrow as Bob appears to have in mind, which relates to the latest version of the program that he has made himself familiar with; it is a broad set of programmatic ideas—one of which, in my view but certainly not the view of many “young guns,” is Case in roughly Vergnaud’s sense.) So I think it is putting the emphasis on the wrong place to insist on a version of the program that we happen to like, for whatever reason. Let me explain with three concrete examples, to address what I take to be the spirit of Bob’s letter.

Passive: it is certainly true that most current versions of minimalism since the proposal of Bare Phrase Structure need not postulate an empty subject e. This is for two reasons. First, because one can state the transformation of movement by literally creating a specifier by way of a “copy” of the moved item, which voids the need for a “landing site.” Second, because the role of subject that e created in models up to the 1990s is subsumed under a so-called Extended Projection Principle (EPP) feature (sometimes also called an “edge feature”). It is well to call this thing a “principle,” for it follows from absolutely nothing. Why do sentences need subjects? Because sentences need subjects (in our paper this is stated “top down”: S à NP VP, but readers should make no mistake about this: it is precisely the same statement.) If one combines this “EPP feature” in the system with the ability to copy elements, then certainly e “waiting for the movement” is unnecessary notation. The question, though, should be: Is the alternative statement any deeper? I will leave that for readers to decide.

What I had to decide as a writer is simple: When David suggested going back to a clean Principles and Parameters statement of the passive rule (involving e), would that be a good idea for readers to follow the point of the argument? I decided that, indeed, it is much easier to understand than all these arcane allusions (without more context) to copies or EPP features. Was I wrong? Again, readers can tell. But I do not want to leave the impression that a notational variant is much more than that. The ultimate question is why sentences need subjects, and that, however we call it, is enough of a problem at this point for us to honestly refer to it as, well, an irreducible axiom. Either way, the role of Case in these matters has not changed—or it has only superficially. In the classical system it was “the need to get Case” that drove movement, in passives and raising (in other instances, the need to satisfy other demands, like Wh- or focus criteria). In the more modern incarnations, depending on who you ask you get different answers. Most people agree, however, that Case is probably playing a role, particularly in “freezing” conditions preventing successive cyclic movement. Does the beauty and significance of Vergnaud’s proposal get enhanced by this more detailed explanation? I am not sure.

Second, in the same vein, Bob appears to have bought the rhetoric that “all you need is Merge,” in Bob Berwick’s celebrated phrasing. Well, yes, if you squint a whole lot… I do not want to give away the gist of the monograph Howard Lasnik and I are finishing on this, but it should be pretty obvious that absolutely nothing in even the logic of merge will give you a inkling of why Trump thinks he is a genius requires no construal limitations between Trump and he, while in he thinks Trump is a genius it is patently obvious that Trump must not co-refer with he. No merge I know of is going to explain that simple syntactic condition (and this is just the well-known highlight). It is somewhat worse, though. Even for standard (phase-based) operations, the amount of squinting you need, to make sure that External Merge comes out as “identical” to Internal Merge, is so dramatic that one has to seriously ask to what extent the “unification” of those two is not merely rhetorical. Don’t get me wrong: Chomsky’s insight that movement is “internal merge” is, to my mind, one of those gedankenblitze. But to call these two things “literally the same” misses two rather serious points. To start with, there is no search mechanism for external merge. You get A, you get B, you put them together, and bingo: you get some combination. Nothing to write home about, external merge. Now, in internal merge, you get A, you look inside A’s guts, you then find B, and finally you merge B to A! Plainly: you need a search-and-match mechanism (this usually goes by the name of Agree) and then, in addition, you need to make sure that B “down there” gets to appear “at the top”. Neither of these gambits is trivial—or required for External merge. You can, of course, impose them on External Merge, for instance insisting that you (trivially) also search for something or other and some matching takes place, but all of that is either unattested or unfalsifiable. (You could also say that A searches B “within the entire lexicon”, but if that is done through the specific mechanism of Agree, I have no idea how one would even state the details in standard terms.) As for the merge part “up there,” the bottom line is this: the moved item, B in this abstract example, starts “down the derivation” (merged to some C) and ends (or at any rate continues, since the process could recur) “up the derivation.” This is a distributed occurrence of B, in two different sites. This is the second point that the alleged unification seems to be missing.

It is worth commenting on that in detail, because it has lead to much confusion. The phenomenon of movement, as originally discovered and formulated by Chomsky (another gedankenblitz, perhaps the biggest of them all) has always been wonderfully puzzling because an item, the same item, appears in several configurations, in a sense, “at the same time.” Linguists invented the mechanism of “copy-plus-deletion” to emphasize how this works, in what nuanced ways—which Freidin himself was one of the great proponents of, in much celebrated detail. If one googles “reconstruction” in syntax, one will see what a rich phenomenon this is indeed. Now, the problem: we have no clue how to formalize any of this. There is, to my knowledge, precisely one formulation of these ideas in full detail: Collins and Stabler 2016, a remarkable effort with consequences still being evaluated technically in the best departments in the world. Here is where they stop short, by their own admission: they will not formulate copy occurrences, precisely the animals we are discussing now. It is not that one cannot stipulate the work of such creatures: it is that it does not make sense in any classical version of the formalism!

Let us please understand: we are talking about ontologies involving (at least) pairs of objects B and B’ such that: (i) B serves a function at configuration X (say, satisfying a thematic role) and B’ serves a different function at a different configuration Y (say, being the sentential subject); (ii) in semantic terms, it can be shown that B-B’ “does something” in a distributed fashion, for example, expressing thematic dependencies in X and expressing theme-rheme dependencies in Y (in other instances the dependency is of the “operator-variable” sort, and then the operator restriction can appear in any of the configurational sites the composite object “goes through”, but only in one); (iii) in phonetic terms, it is obvious in most instances that B-B’ is pronounced “up” (in the Y vicinity) or “down” (in the X vicinity)—but never in both sites (except in some instances involving “heads,” in which it can be shown that the moved element is pronounced distributively…). It is no wonder that it makes no sense to state this formally (readers are welcome to try their hand at this!). In my work with Roger Martin, we have argued that this behavior is actually non-classical, in ways that seem obvious: in particle spin, say, one knows how to express “up” and “down” states in terms of the Pauli linear operators—which, we suspect, is what is going on. (Paul Smolenski has articulated a formalism that is consistent with this view, although Martin and I have shown that you don’t need Optimality Theory, or Harmony Grammar, to express such non-classical statements; Chomsky’s grammar works too, when “feature matrices” are taken to be, well, linear operators.) Bottom line: that (often called a “chain”) is what you get from Internal Merge, not External Merge. Are the processes literally the same? Does it help the reader to hear about all of this? I remain unconvinced—though it may be just me.

Third and final reaction to Bob’s letter: the “ideal speaker,” which he objects to by alluding to the notion “I-language.” It is instructive to ask what the “I” in “I-language” refers to. Chomsky 2015:3 repeats a formula he has insisted on for decades: “… ‘I’ stand[s] for internal, individual and intensional.” Later, on p. 4, he says, when speaking of an I-language as a generative procedure or a mental organ, that he takes the mind “to be the brain viewed at a certain level of abstraction.” If that is not an idealization (in the customary sense in science), I do not know what is. Nor do I see what harm it does to admit that we are dealing with an idealization, for the same reason every scientific postulate is. That does not mean that languages (organism, organ, cell, molecule, atom, electron…) are not Galilean idealizations of reality, so as to come up with a theory of how it all works. If Freidin is worried about this being confused with a notion of language where the “I” doesn’t stand for anything but “idealization,” he should not be. First, because recognizing that expansion from the idealization (to the customary internal as opposed to external, individual as opposed to societal, and intensional as opposed to extensional) is as expected within the kind of cognitive psychology I represent as it is not straightforward. I certainly presuppose I-language in all of my work, but I would be hard pressed to say what that means in terms of a biological theory. As an idealization, though, it makes perfect sense to me, and its fruits are there for anyone who cares to look. But to take things more seriously than that, to actually expect the mind, let alone the brain, to literally behave like a computer, is at best a hypothesis about how these idealizations could come to be—and at worst misinformed.

Last thought: What impressed me the most about our piece is that the most daring prediction it made came out sadly true: “Please welcome President Trump.” I am of course terrified about that prospect, but proud to have seen it coming.

David Berlinski replies:

A language, Robert Freidin writes,

constitutes a system of knowledge which is internal in the brain/mind of the speaker, what Chomsky in Knowledge of Language designates as an I-language, where I stands for internal and individual. This rejects the notion of “an ideal speaker-listener” characterized in the quote from Chomsky 1965 that Berlinski & Uriagereka cite, a notion that presumably identifies no real speaker-listener who exists in the world and therefore conflicts with the concept of I-language, which identifies a phenomenon that exists in the world.

The last sentence of this remark could not be true as written. If an I-language must be tied to an individual, and thus to “a phenomenon that exists in the world,” the last speaker of Sumerian, having long since vanished, must have taken the Sumerian language down to the Dark Place with him. This is obviously not so. He is gone; his language remains. We can reconstruct Sumerian without him. Aside from grumbling about our pronunciation of his language, what could he tell us?

Generative linguists, Freidin asserts, are studying a faculty of the brain or the mind.

Are they?

The closest that linguists come to the brain, I must observe, is when they suffer from migraines or otherwise bop their heads. Neuropsychologists have undertaken any number of experiments of the sort in which, when a dozen or so undergraduates are stuck with pins and thereupon say ouch, a certain region of their brains lights up. These experiments do very little to show how a system of knowledge could be embodied or embedded in the brain, or any other organ of the body. A computational system is no more in the brain than a memory is in a photograph.

If the system of knowledge is not internal to the brain, it is hard again to see why it should be internal to the mind, since the mind does not have any kind of obvious topology and so no definition of an interior point.

No linguist is particularly interested in tying the I-languages to any particular speaker of a given language—Fred McKlotsky, say. Idealization and abstraction are always required. This is consistent with the description that Chomsky offered in 1965 of an ideal speaker, someone free of the hesitations and infelicities of ordinary speech. Neither that ideal speaker nor the lexicon and computational system comprising an I-language are in the world. The world is what it is, as V.S. Naipaul observed, and what is in it or out of it depends entirely on what theories are at issue or up for grabs. There is no privileged hierarchy in ontology. Quarks are no more fundamental than contracts. Neither is made of the other.

The distinction between what is really real (quarks, taxes, Twitter) and what is sentimentally real (books, leptons, p-adic numbers) is based on the illusion that the sentimental concepts are somehow mind-dependent. In a universe without human minds—Harvard, say—there would be quarks or quacks, but not books. Although this seems right enough to me on sentimental grounds, the distinction that it embodies is an illusion. Everything is mind dependent, because everything must be interpreted. Interpretation is always regressive. This is the deepest lesson imparted by twentieth-century logic, and true of even the natural numbers. A computer can carry out the addition of two natural numbers, but wishing to know what addition means, I must grasp the recursive definition. There is first 0 + a = a. That goes without saying. And all at once, there is (1 + n) + a = 1 + (n + a).38 In grasping the recursive definition, I have moved to higher ground. There is not the slightest difficulty, Alonzo Church once memorably remarked, in formalizing the higher ground; but that is a matter of how it is described and not whether it is there.

The human stain is always there. We can no more imagine a world without human beings than any individual can imagine a world after his own death. The appeal to what Kant called the Ding-an-sich is no more creditable in the philosophy of science than in metaphysics.

There is no such Ding.

The distinction between I and E languages to which Freidin pledges his allegiance is a distinction without a difference. To insist on the primacy of I-languages over E-languages, perhaps on the grounds that an E-language is nothing more than a dialect with an army and a navy, is rather like insisting that it is the integers that are really real, while the ring of integers is purely an epiphenomenon. To dismiss the E-languages in this way is to strike an enormous amount of painfully acquired human knowledge from the record. If Latin grammarians were wrong to think Latin a dialect of Greek, they were wrong about something.

We wrote that “no one would think to say that the Extended Standard Theory was wonderfully elegant.” Pas du tout, Freidin remarks: “The EST was clearly a more elegant theory—wonderfully so in its context—than its predecessors, though not as elegant as what has evolved under the minimalist program.”

If I write that Albany is just miserable in winter, it is no defense of Albany to remark that Rochester is worse.

The minimalist program, which Freidin commends, is rather like a trophy wife. It is good to look at, purrs like a kitten when stroked, and is very expensive to maintain. Linguistic wives have a short shelf-life. Watch out Kitten. I am mentioning this for your own good.

Under the minimalist program, the human faculty of language has two parts: an invariant computational system, and a variable lexicon, with both substantive and functional entries. There are as many lexicons as there are languages. Since the computational system is everywhere the same, learning a new language seems to be a matter of acquiring a great many new words.

Who would have ever imagined it?

In the minimalist program, language proceeds by binary branching and from the bottom up. The operation of merge associates lexical items directly by an unmediated set-theoretic operation. One minute there is x and the next y, and then there is {xy}. The set {xy} can acquire a label in x or y, whereupon there is {{x},{xy}}. Two lexical items have found themselves companionable objects of the same set.

There is something earthy about mergers in syntax. Wishing to say that Rasputin snores, I need not trouble myself with any fancy phrase structures. I can go right to the lexicon and get what I need. The items go into what is often called the workspace, and thereafter computations proceed. This is not how the EST saw things, but in the Aspects, Chomsky remarked, reasonably enough, that he, for one,

saw no plausibility at all to the assumption that the speaker must uniformly select sentence type, then determine subcategories, etc. finally, at the last stage, deciding what he is going to talk about; or that the hearer should invariably make all higher-level decisions before doing any lower-level analysis. 39

Whatever the attractions of merge as an operation, they are offset to a degree by its liabilities. A normal native English speaker commands a lexicon of some sixty thousand items; recent college graduates, perhaps half that number. In thinking to say that Rasputin snores, and wondering briefly to myself who it is that is snoring, I must confine myself—no?—to those parts of the lexicon that are restricted to snoring Rasputins. There it is, under the R’s.

At last, I can begin.

But only just. No partition of the lexicon will allow me to find what I am looking for unless in some sense, I know what I am looking for. Not any noun or name will do. I have no interest at all in remarking of Hartwig that he was a great snorer, even though his snores kept the entire Russian embassy in Belgrade awake night after night, a point well-known to specialists. This suggests that in order to access a lexicon, I must have a lexicon in mind.40

“The problem of choice of action is real,” Chomsky observes, “and largely mysterious, but does not arise within the narrow study of mechanisms.”41 This is a little like saying that the problem of fitting eleven men in a ten-man lifeboat does not arise within the narrow confines of a ten-man lifeboat.

True enough. It doesn’t. But in the case of minimalism, the odd man out is right there, eager to clamber aboard.

Very similar questions might be raised about the imperative in the minimalist program to compare derivations globally.

How is that done? It is a perfectly obvious problem, and Chomsky deals with it by imagining one computation looking over all of the others from a corner of its eye.42

Robert Freidin is professor emeritus in the Council of the Humanities in the Philosophy Department at Princeton University.

Juan Uriagereka is a professor in the Department of Linguistics at the University of Maryland.

David Berlinski is an American writer.

Robert Freidin is professor emeritus in the Council of the Humanities in the Philosophy Department at Princeton University.

Juan Uriagereka is a linguist at the University of Maryland.

David Berlinski is an American writer.

  1. David Berlinski and Juan Uriagereka, “The Recovery of Case” Inference: International Review of Science 2, no. 3, 2016. The published version of “Filters and Control” appears in Linguistic Inquiry 8 (1977): 425–504. Jean-Roger Vergnaud, “Letter to Noam Chomsky and Howard Lasnik on ‘Filters and Control’ (April 17, 1977),” in Foundational Issues in Linguistic Theory: Essays in honor of Jean-Roger Vergnaud, eds. Robert Freidin, Carlos Otero and Maria Luisa Zubizaretta (Cambridge, MA: M.I.T. Press, 2008), 3–15, and also in Syntax: Critical concepts in linguistics, eds. Robert Freidin and Howard Lasnik, vol. 5 (London: Routledge, 2006), 21–34. 
  2. Noam Chomsky, “A Minimalist Program for Linguistic Theory,” in The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger, eds. Kenneth Hale and Samuel Jay Keyser (Cambridge, MA: MIT Press, 1993), 1–52. 
  3. Noam Chomsky, What Kind of Creatures Are We? (New York: Columbia University Press, 2016), 2. 
  4. A word of caution is in order here. The term lexicon masks some intricate and sometimes murky issues: see Carlos Otero, “Neurology and Experience,” in Language, Syntax, and the Natural Sciences, eds. Ángel Gallego and Roger Martin (Cambridge: Cambridge University Press, to appear) for some illuminating commentary. 
  5. This definition supersedes any definition of a language based on a set of externalized sentences, unbounded or not—a definition that is problematic in many ways. 
  6. Noam Chomsky, Knowledge of Language: Its Nature, Origin, and Use (New York: Praeger, 1986), chapter 2. 
  7. Noam Chomsky, Aspects of the Theory of Syntax, (Cambridge, MA: M.I.T. Press, 1965), 3. 
  8. For further discussion of the concept of I-language versus the ‘common sense notion’ of a language, see Noam Chomsky, Knowledge of Language: Its Nature, Origin, and Use (New York: Praeger, 1986), chapter 2. 
  9. Berlinski and Uriagereka claim that “a descriptive grammar adequate to the demands of a particular language comprises a hideously complicated system of rules.” They cite as an example, inexplicably, Randolph Quirk et al., A Comprehensive Grammar of the English Language (London: Longman, 1985), which far from being a generative grammar of English, provides no explicit rules at all (i.e., no phrase structure rules and no transformations). Furthermore, it remains to be demonstrated that such a grammar would require a hideously complicated system of rules. If the EST and its simplification under the minimalist program (see below) is on the right track, then the hideous complications of a language may well be limited to the idiosyncratic properties of its lexicon. 
  10. See Noam Chomsky, Syntactic Structures (The Hague: Mouton, 1957); Noam Chomsky, Aspects of the Theory of Syntax, (Cambridge, MA: M.I.T. Press, 1965). 
  11. See Noam Chomsky and Howard Lasnik, “Filters and Control,” Linguistic Inquiry 8 (1977): 432; Noam Chomsky, “Conditions on Rules of Grammar,” Linguistic Analysis 2 (1976): 173. 
  12. Noam Chomsky, “Conditions on Transformations,” in A Festschrift for Morris Halle, eds. Stephen Anderson and Paul Kiparsky (New York: Holt, Rinehart and Winston, 1973), 232–86. 
  13. For a second time, the first being Perlmutter’s 1968 M.I.T. Ph.D. dissertation Deep and Surface Constraints in Syntax, published as Deep and Surface Structure Constraints in Syntax (New York: Holt, Rinehart and Winston, 1971), which Chomsky and Lasnik credit with “a much more far-reaching investigation and analysis of filters” (“Filters and Control,” Linguistic Inquiry 8 (1977): 425). 
  14. Noam Chomsky and Howard Lasnik, “Filters and Control,” Linguistic Inquiry 8 (1977): 459. 
  15. Berlinski and Uriagereka talk about the filter “blocking COMP deletion” as some sort of process that occurs in the derivation of these sentences. Specifically they propose that COMP deletion produces (5c) and then the Chomsky and Lasnik filter “restores it to the common decencies of a grammatical sentence by blocking COMP deletion.” This is somewhat incoherent and furthermore misleading. We could say that the filter (93) has the effect of blocking COMP deletion in the case of examples like (5c), it cannot prevent free COMP deletion from applying, or “restore” a for that has been deleted in a derivation, which would require a new grammatical operation. 
  16. Jean-Roger Vergnaud, “Letter to Noam Chomsky and Howard Lasnik on ‘Filters and Control’ (April 17, 1977),” in Foundational Issues in Linguistic Theory: Essays in honor of Jean-Roger Vergnaud, eds. Robert Freidin, Carlos Otero and Maria Luisa Zubizaretta (Cambridge, MA: M.I.T. Press, 2008), 3. 
  17. Jean-Roger Vergnaud, “Letter to Noam Chomsky and Howard Lasnik on ‘Filters and Control’ (April 17, 1977),” in Foundational Issues in Linguistic Theory: Essays in honor of Jean-Roger Vergnaud, eds. Robert Freidin, Carlos Otero and Maria Luisa Zubizaretta (Cambridge, MA: M.I.T. Press, 2008), 4. Note that [–N] is a feature notation that covers both verbs and prepositions. Vergnaud proposes two alternative interpretations of his filter (2), one where only NPs with phonetic content are marked for Case, and the other where Case-marking is extended to traces that result from wh-movement. For an argument that supports the second, see Howard Lasnik and Robert Freidin “Core Grammar, Case Theory, and Markedness,” in Theory of Markedness in Generative Grammar, eds. Adriana Belletti, Luciana Brandi, and Luigi Rizzi (Pisa: Scuola Normale Superiore, 1981): 407–21. 
  18. See Noam Chomsky, “On Binding” Linguistic Inquiry 11 (1980): 1–46 . “On Binding” formulates the filter as (70), where it applies to “lexical NPs”:

    1. *N, where N has no Case.
    On page 49 of Chomsky’s Lectures on Government and Binding, the filter is reformulated as (6) and named the Case Filter:

    1. *NP if NP has phonetic content and has no Case
    claiming that “only the empty categories trace and PRO may escape the Case Filter, appearing with no Case”—but see footnote 17.

    See also Noam Chomsky, Lectures on Government and Binding (Dordrecht: Foris, 1981).

    For discussion of this framework, see Noam Chomsky and Howard Lasnik 1993, “The Theory of Principles and Parameters,” in Syntax: An International Handbook of Contemporary Research, eds. Joachim Jacobs, Arnim van Stechow, Wolfgang Sternefeld, and Theo Vennemann (Berlin: Walter de Gruyter): 506–69; Robert Freidin, “Generative Grammar: Principles and Parameters Framework,” in The Encyclopedia of Language and Linguistics, ed. R. E. Asher, vol. 3 (New York: Pergamon Press, 1994), 1,370–85; Robert Freidin, “Generative Grammar: Principles and Parameters,” in The Encyclopedia of Language and Linguistics, 2nd edition, ed. Keith Brown (Boston: Elsevier Science Ltd., 2006): 119–137; Robert Freidin, “Conceptual Shifts in the Science of Grammar: 1951-1992*,” in Noam Chomsky: Critical Assessments, ed. Carlos Otero, vol. 1 (New York: Routledge, 1994) 653–90; Robert Freidin, “A Brief History of Generative Grammar,” in Handbook of the Philosophy of Language, eds. Gilllian Russell and Delia Graf-Fara (New York: Routledge, 2012): 894–915; Robert Freidin, “Noam Chomsky’s Contribution to Linguistics,” in Oxford Handbook of the History of Linguistics, ed. Keith Allen (New York: Oxford University Press, 2013), 439–67. 
  19. Alain Rouveret and Jean-Roger Vergnaud, “Specifying Reference to the Subject: French Causatives and Conditions on Representations,” Linguistic Inquiry 11 (1980): 97–202. 
  20. See Robert Freidin and Jean-Roger Vergnaud, “Exquisite Connections: Some Remarks on the Evolution of Linguistic Theory,” Lingua 111 (2001): 639–66 for discussion of the evolution of linguistic theory from GB to the Minimalist Program and how the generative enterprise attempts “to assimilate the study of language to the main body of the natural sciences” (Noam Chomsky, Generative Grammar: Its Basis, Development and Prospects (a special issue of Studies in English Linguistics and Literature, Kyoto University of Foreign Studies, 1987), 1. 
  21. Noam Chomsky, “On Binding,” Linguistic Inquiry 11 (1980): 24. 
  22. Noam Chomsky, Rules and Representations (New York: Columbia University Press, 1980), 2—which is quoting Steven Weinberg, “The Forces of Nature”, Bulletin of the American Academy of Arts and Science 29, no. 4 (January 1976): 28. 
  23. Noam Chomsky, Rules and Representations (New York: Columbia University Press, 1980), 9–10. 
  24. From the preface to Les Atomes by Jean Baptiste Perrin, 1926, Nobel laureate in Physics, (Paris: Félix Alcan, 1913) where the phrase in the original is highlighted here in boldface:
    Deviner ainsi l’existence ou les propriétés d’objets qui sont encore au delà de notre connaissance, expliquer du visible compliqué par de l’invisible simple, voilà la forme d’intelligence intuitive à laquelle, grâce à des hommes tels que Dalton ou Boltzmann, nous devons l’Atomistique, dont ce livre donne un exposé (p. v).
    Chomsky’s characterization can be found in Noam Chomsky, “Problems of Projection,” Lingua 130 (2013): 35. 
  25. Noam Chomsky, “On Binding,” Linguistic Inquiry 11, no. 3 (1980): 25; Noam Chomsky, Lectures on Government and Binding (Dordrecht: Foris, 1981), 124. 
  26. Specifically, within NPs as in the city’s destruction (= the destruction of the city) and between clauses as in John seems [t to be a nice fellow]. 
  27. Noam Chomsky, “On Binding,” Linguistic Inquiry 11, no. 3 (1980). 
  28. Merge is proposed as the basic structure building operation in Noam Chomsky, “Bare Phrase Structure” in Evolution and Revolution in Linguistic Theory, eds. Héctor Campos and Paula Kempchinsky (Washington, DC: Georgetown University, 1995), 51–109, a decade and a half after phrase structure rules had been abandoned. Chomsky in a class lecture in September 1979 had observed that such rules stipulated properties that followed independently from general principles (including the Case Filter), see also Timothy A. Stowell, Origins of Phrase Structure, M.I.T. Ph.D. dissertation, 1981. 
  29. For discussion of why the argument in (6) is not pronounced in both contexts, see Robert Freidin “Chomsky’s Linguistics: the Goals of the Generative Enterprise,” Language 92, no. 3 (2016): §4.2 and the references cited. 
  30. Noam Chomsky, “Beyond Explanatory Adequacy” in Structures and Beyond, ed. Adrianna Belletti (Oxford: Oxford University Press), 110; Noam Chomsky, “Problems of Projection,” Lingua 130 (2013): 40. 
  31. Whether Merge is binary for coordinate structures (e.g. the examples analyzed as (1-3) above) may be questionable. If it is, then synonymous pairs of phrases would not share the same hierarchical structure and a more complicated analysis would be required to account for their synonymy. 
  32. In Aspects of the Theory of Syntax (p. 122), a simple substitution operation, a transformation, replaces class of complicated lexical insertion rules in phrase structure grammar. With Merge, and in particular internal Merge, substitution as an elementary transformational operation is eliminated from the theory—another welcome simplification. 
  33. Noam Chomsky, “A Minimalist Program for Linguistic Theory,” in The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger, eds. Kenneth Hale and Samuel Jay Keyser (Cambridge, MA: MIT Press, 1993), 1–52; Noam Chomsky, “Beyond Explanatory Adequacy,” in Structures and Beyond, ed. Adrianna Belletti (Oxford: Oxford University Press), 106. 
  34. See Robert Freidin, “Chomsky’s Linguistics: the Goals of the Generative Enterprise,” Language 92, no. 3 (2016): 671–723, especially footnote 11, for discussion. 
  35. Julie Legate, “Abstract and Morphological Case,” Linguistic Inquiry 39 (2008): 95. 
  36. Laura Kalin, “Licensing and Differential Object Marking: The View from Neo-Aramaic,” (ms. 2016), 24 (to appear in Syntax), where φ stands for the features involving number, gender, and person. 
  37. I am indebted to Samuel Jay Keyser for his comments on an earlier draft of these comments. 
  38. I discuss all this in One, Two, Three (New York: Pantheon, 2011). 
  39. Noam Chomsky, Aspects of the Theory of Syntax (Cambridge, MA: MIT Press, 1965), 197. 
  40. See Neil Smith and Annabel Cormack, “Features from Aspects via the Minimalist Program to Combinatory Categorical Grammar,” in Ángel Gallego and Dennis Ott, eds., Fifty Years Later: Reflections on Chomsky’s Aspects, (Cambridge, MA: MIT Working Papers in Linguistics, 2015), 233–248. 
  41. Noam Chomsky, The Minimalist Program (Cambridge, MA: MIT Press, 1995), 227. 
  42. Noam Chomsky, The Minimalist Program (Cambridge, MA: MIT Press, 1995), 265.