In response to “The Galilean Challenge” (Vol. 3, No. 1).

To the editors:

The Basic Property takes language to be a computational system, which we therefore expect to respect general conditions on computational efficiency. A computational system consists of a set of atomic elements, and rules to construct more complex structures from them … Universal properties of the language faculty came to light as soon as serious efforts were undertaken to construct generative grammars. These include properties that had never before been noticed, and that remain quite puzzling. One such property is structure dependence. The rules that yield the language of thought appeal only to structural properties, ignoring properties of the externalized signal, even such simple properties as linear order.
Noam Chomsky, “The Galilean Challenge.”

In communicating with each other through language, we constantly reason, and draw valid inferences. If I tell you that I am looking for my glasses which I think I left either in my office or in the class where I just taught, and you tell me, “They weren’t in class; I was just there,” I will go looking for them in my office. I am following here a logical schema. I know that p or q. You tell me not p. Therefore, I conclude q. We draw inferences of this sort all the time, typically without knowing how. Here is a slightly more sophisticated example. It is a little test, which comprises two cases.

Case 1.

Suppose you overhear the following conversation between two students:

  1. Student A: How was the exam?
    Student B: You know, even John didn’t make it

Now ask yourself: Is John “smart” or “dumb”?

You will rapidly conclude “John has got to be smart.” Without this conclusion, the previous dialogue does not make much sense. But why? How do you reach this conclusion so quickly?

Now change things as follows.

Case 2.

  1. Student A: How was the exam?
    Student B: You know, even John made it.

Ask yourself the same question again: Is John “smart” or “dumb”? Now you will conclude that John has got to be dumb. The only difference between Example 1 and Example 2 is the presence versus absence of negation (didn’t make it vs. made it). The adverb even and its interaction with negation lead you to reach opposite conclusions in case 1 vs. case 2.

Let us try to reconstruct your subconscious reasoning. First, what does even do in a sentence? If one says, for example, “I understood even Chomsky’s article,” the fact of understanding Chomsky’s article is being presented as less likely, more noteworthy and surprising than understanding somebody else’s article. Even p, in other words, indicates that p is unlikely in the mind of the speaker. Sayings like “even a blind squirrel sometimes finds an acorn” are ways of conveying that even very unlikely events sometimes do happen. So in Case 1, student B is presenting John not passing the exam as unlikely. If John were not smart, this would not be news (for it is quite likely that someone who is not so smart does not pass an exam). In case 2, the absence of negation makes us flip the conclusion. John’s passing the test is presented as surprising/unlikely. Normally, smart people do quite well at exams. If John were smart, the fact that he made it would not be surprising. Therefore, John must not be so smart.

It is clear that if we spell out the reasoning we go through, we wind up with something quite complicated, which involves probabilities, negation, plus generalizations like “smart people usually do well at exams,” and so on.

The point of this example is, of course, to get at how language works and Chomsky’s stance on it. First, it seems that communication relies on inference or on some spontaneous capacity to draw valid consequences—i.e., what is known as logic. When we say something or we are told something, we compute a host of implications (entailments, presuppositions, implicatures...). Second, these computations can be fairly complex and are largely automatic and subconscious. Drawing consequences is not tied to the cultural level of speakers; it is just what understanding or interpreting speech amounts to. Figuring out how this is possible is a very hard question, fundamental to understanding human nature, and a facet of what Chomsky calls the “Galilean Challenge.”

Chomsky’s proposal is that the language faculty rests on quite simple operations like, in particular, a set-forming operation “merge X with Y”: MERGE(X,Y) = {X,Y}, which can apply recursively on a primitive vocabulary. Chomsky, moreover, goes on to say that the structures created by the language faculty are made available to the sensory motor system that externalizes them in audible sequences of sounds and to the conceptual-intentional system that uses them to communicate and draw inferences. But in spite of the fact that language is so effective in communicating and reasoning, Chomsky thinks that language is not motivated by communication. “In every known case in which computational and communicative efficiency conflict, communication efficiency is ignored. The facts run counter the common belief, often virtual dogma, that communication is the basic function of language.”1 The example we started from is one of many ways of showing how pervasive and complex our spontaneous use of logic is. But if so, how can it all stem from a simple structure-building operation like merge? How can the language faculty create such a powerful capacity for referring and reasoning almost out of thin air? Is Chomsky’s position tenable? The following considerations try to show that, against appearances, insofar as inference is concerned, Chomsky may be right.

Logic stems from elementary expressions like not, if, and, or, which the British mathematician George Boole, the father of modern (propositional) logic, claimed are at the basis of the fundamental “laws of thought.”2 These are operations on the truth-value of sentences. If a sentence p is true, then not p is false (and vice-versa); p or q is true if at least one of p, q is true, and so on. These operations are simple enough. Perhaps they exist independently of language. Perhaps there are precursors of logical operations one can individuate, for example, in the cognitive systems of non-human animals. But the simple point I want to underscore is that without a powerful enough recursive system like the one based on merge, such operations could not go far.

Let me expand a bit, by adding some color. The controversy over to what extent animals reason like us is longstanding.3 The stoic philosopher Chrisippus (c. 279–206 BCE) is reported to have been one of the first authoritative proponents of the view that animals do reason. His writings were lost, except for a few fragments, but the physician and philosopher Sextus Empiricus (second century CE) reports that:

[Chrysippus] declares that the dog makes use of the fifth complex indemonstrable syllogism when, on arriving at a spot where three ways meet, after smelling at the two roads by which the quarry did not pass, he rushes off at once by the third without stopping to smell. For, says the old writer, the dog implicitly reasons thus: “The animal went either by this road, or by that, or by the other: but it did not go by this or that, therefore he went the other way.”4

So Chrisippus imputes to dogs the same capacity to work through a “disjunctive syllogism” (p or q; but not q; therefore p) like the one I used above to illustrate the spontaneous logicality of language. And, in fact, we know our pets can easily be taught things that are negation-like, in the sense of sharing certain semantic properties of ordinary negation. Your dog is heading for the couch. You say “don’t.” It stops. Many aspects of animal behavior seem to require something like disjunction (= choice), conjunction (= addition/incrementality), conditionals (“if you perceive this smell, go there”), and, well, negation, as we just saw. Possible precursors to Boole’s laws of thought?

However that may turn out to be, the way in which humans develop logic is greater by several orders of magnitude. What your dog will never learn (so, do not try to teach it) is double or triple negation, viz. the fact that not not p typically conveys the same information as p. You will counter that multiple negatives of this sort are inventions of logicians, unduly divulged by some old-fashioned schoolteacher. But imagine how multiple negations look in a naturalistic setting:

A: I am so upset.
B: Why?
A: Nobody will come to my party!
B: I seriously doubt that.
A: Well, I don’t.

You surely have not had any difficulty in grasping the sense of this dialogue (of modest brilliance). Yet the last utterance of speaker A contains a triple negation:

A: Nobody will come. [1st negation]
B: I doubt that (where that = nobody will come). [2nd negation]
A: I don’t doubt that. [3rd negation]

This last instance involves verb phrase ellipsis, a widespread and well-studied linguistic construction (I use the cross out to indicate the elided verb phrase).5 Surely our understanding here of what literally amounts to not not not p (= not p) is immediate. Nor are we aware of the fact that we are actually computing a triple negation. These uses of multiple negations are not of the kind that old-fashioned schoolteachers teach. Children, in converging towards their adult grammars, become proficient in dialogues like the one above, without formal training.

Back to merge and its relation to logic. We have just made the point that logical functions may be independent of language, but something changes qualitatively when these operation are fed to a powerful enough recursive system. The reason for this is that when merge kicks in, one immediately becomes able to compute not just {not p}, but also {not {not p}}, {not { p or q}}, ... and so on, ad libitum. In fact, from this perspective, we should become able to make these computations “overnight,” as it were, for it is in the very nature of merge (or any operation with similar power) to run on its own outputs, creating structures of indefinite complexity, with determined structural relations to each other. All that is required is an ability to label a structure p as true or false. If, for example, we regard p as false—i.e., we label it as F—then {not F} = {T}, because that is what not does. And {not {not p} = {not T} = {F}, which takes us back to the original assumption about p. And so on. We also became able to compute interactions between logical words, like I don’t drink and drive (I never do those things together, though I may do them separately) vs. I don’t smoke or drink (I never smoke nor do I ever drink). Simple recursive operations like merge indeed appear to be necessary to the development of logical abilities. An analogy that comes to mind is that with numbers. A rudimentary capacity for counting and measuring does exist in pre-verbal children or non-human animals. But a full-fledged capacity to count seems to emerge only concomitantly with language.6

A caveat. One might think that things as elementary as, say, conjunction, should be easy to spot across languages; in looking at a language we do not know we should be able rapidly to determine what word is used for and. Yet identifying even just elementary operations like conjunction, disjunction, and negation across languages is no easy task. For example, there are languages that do not have conjunction as we know it; no simple word or morpheme seems to mean what and means in English. Warlpiri, an Austronesian language spoken in Northern Australia or Cheyenne, Algonquian, spoken in the U.S. central plains, are languages of this sort.7 But how could the speakers of these languages get by without a word for and? What seems to happen is that the word for or gets recycled to express and. What we see for example in Warlpiri is that a certain word (madu) in the scope of negation (or in the antecedent of a conditional) is interpreted as or is in English:

Kula-rna yunparnu manu wurntija jalangu. Lawa
NEG-1 SG-SUBJ sing-PAST or dance-PAST today. Nothing

“I didn’t sing or dance today. I did nothing”

But in a plain declarative environment manu switches to an and-meaning:

Ngapa ka wantimi manu warlpa ka wangkami
Water AUX fall-NON PAST or wind AUX speak-NON PAST

“Rain is falling and wind is blowing.”

This may seem very strange. Yet something quite similar seems to sometimes happen with English or. For example, in John is stronger than either Bill or Marc, I am telling you that John is stronger than both Bill and Marc. But I am using or to do that. This is not random. Something systematic in the interaction between comparatives and or produces the effect. And if I tell you for this class you may read paper A or Paper B, I am telling you that you are, in fact, allowed to read paper A and you are allowed to read paper B, another case where or is interpreted as and. One must conclude that there is some kind of operation that strengthens the meaning of or to and.8 Possibly this operation is used only in specific contexts in English, but more broadly in Warlpiri or Cheyenne, so as to take over the role of and altogether.

As it often is the case with natural languages, phenomena that appear very exotic at a first encounter wind up in your own backyard, once you become aware of their existence. French has a kind of coordinating construction soit... soit..., which is used to convey disjunction. For example, the sentence “Les entrevues ont été réalisées soit en personne, soit par téléphone” has to be glossed as “the interviews were conducted either in person or by phone.” Soit comes from the subjunctive form of the verb être, to be, meaning something like “may be.” Using the subjunctive of the verb to be is a typical way in which languages create disjunctions. You can see that also in English. Maybe John will bring wine, maybe he will bring beer conveys roughly the same as “John will bring wine or beer.” So far, so good.

Italian has a near perfect counterpart of the French soit... soit construction, namely sia... sia... However, the Italian cognate of the French construction actually means “and/both.” Gianni ha parlato sia con Maria, sia con Francesca means “Gianni spoke with (both) Maria and Francesca.” This looks like another case in which a form that originally must have had a disjunctive meaning, like its French counterpart, has morphed into a conjunctive meaning. Obviously, we are not dealing with an accidental drift here, as it happens repeatedly, involving essentially the same logical function across historically-unrelated, geographically-discontinuous languages. That is why identifying elementary logical particles across languages is far from trivial. But it constitutes one of the most exciting research areas in modern linguistics, as it unveils unsuspected ways in which language and logic interact.

I want to conclude by discussing structure dependence in connection with our logical abilities. Chomsky points out how structure dependence is one of the most widespread and puzzling properties of natural languages. Let me give an example of how structure dependence shows up in the interaction of logical words, exactly as one would expect of any interaction brought about through language. We started out by noticing how the interaction of even with negation gives raise to polarity reversals. A sentence like I read many difficult books and I was able to understand even Chomsky’s is natural enough. But a sentence like I read many difficult books and was able to understand even Pinocchio is kind of strange and/or in need of some pretty special context. Use of even signals that the speaker considers understanding the children’s novel Pinocchio unlikely, and why would that be? Similarly for John understood even the hardest of my papers (natural) vs. John understood even the easiest of my papers (strange). The latter becomes natural under negation: John didn’t understand even the easiest of my papers. This is a polarity reversal induced by negation similar to the one in our initial example. The easiest of my papers should be easy enough for anyone to understand (or at least, that is how I am presenting it). It is therefore surprising/unlikely that someone would not understand even my easiest production. Now what is interesting is that the relation between negation and even must be of a special structurally-determined character for polarity reversal effects of this sort to be triggered. Compare the following sentences:

1. No one (without making some effort) got even my easiest point. Natural
2. Someone who made no effort got even my easiest point. Strange
3. Someone who made no effort got even my hardest point. Natural

The first sentence contains a negation to the left of even, and the overall result is natural. Also the second sentence contains a negation to the left of even (no effort). But in this case the result is strange: we want to switch to “even my hardest point” as in the third sentence, natural again. The reason for this is that even is structurally, though not linearly, more directly connected to negation—i.e., in the scope of negation—in the first sentence than in the second (in spite of the fact in the second sentence negation appears to be closer to even in linear terms). This is a manifestation of the ubiquitous structure dependence of language. Our intuitions about even and negation are surprisingly complex to begin with. They are also governed structurally, rather than in terms of linear order. As is everything else in language.9

In conclusion, our capacity to draw inferences is truly remarkable and manifests itself in quite striking ways through language, bearing clear marks of the signature property of linguistic computation, namely structure dependence. Simple logical operations like negation, disjunction, etc. may exist in a conceptual intentional system independently of grammar—i.e. independently of the computational machine at the basis of language. One might find the rudiments of cognitive abilities that could be defined as logical outside of language, but they will have sparse effects, anchored to specific domains, and display interactions of limited complexity.

Complexity of a different order of magnitude appears with language. Grammar, with its capacity for recursion, seems necessary to unleash the full power of logic. The degree of variation across languages, even in just the simplest logical vocabulary, is quite striking. But at least some of such variation begins to yield its secrets. Much more needs to be understood about the properties of our computational system, such as its capacity to apply to very diverse domains. But Chomsky’s stance keeps being one of the very few that may help unveiling the inner working of the language faculty.

Gennaro Chierchia

Gennaro Chierchia is an Italian linguist and Haas Foundation professor of linguistics at Harvard University.

  1. Noam Chomsky, “The Galilean Challenge,” Inference: International Review of Science 3, no. 1 (2017). 
  2. George Boole, An Investigation of the Laws of Thought: On which are Founded the Mathematical Theories of Logic and Probabilities (Mineola, NY: Dover Publications, 1854). 
  3. For a general framing of the relevant issues and essential references, see Kristin Andrews, “Animal Cognition,” in The Stanford Encyclopedia of Philosophy (2016). 
  4. Sextus Empiricus, Outlines of Pyrrhonism I, 69-70, trans. R. G. Bury (Cambridge, MA: Harvard University Press, 1933), 43. 
  5. See Jason Merchant, The Syntax of Silence: Sluicing, Islands, and the Theory of Ellipsis (Oxford: Oxford University Press, 2001), and references therein. 
  6. Stanislas Dehaene, The Number Sense: How the Mind Creates Mathematics (Oxford: Oxford University Press, 2011); see also Lisa Feigenson et al., “Core Systems of Number,” Trends in Cognitive Sciences 8, no. 7 (2004): 307–14; Lisa Feigenson et al., “The Representations Underlying Infants’ Choice of More: Object Files versus Analog Magnitudes,” Psychological Science 13, no. 2 (2002): 150–56; and Helen De Cruz and Pierre Pica, Number as Test Case for the Role of Language in Cognition (Oxford: Routledge, 2008). 
  7. Sarah Murray, “Cheyenne Connectives,” 45th Algonquian Conference (2013) and Margit Bowler “Conjunction and disjunction in a language without and,” Proceedings of SALT 24 (2014). The examples that follow in the text are taken from Bowler’s work. 
  8. The operation in question is identified and discussed in Danny Fox, “Free Choice and the Theory of Scalar Implicatures,” in Presupposition and Implicature in Compositional Semantics, eds. Uli Sauerland and Penka Stateva (London: Palgrave Macmillan UK, 2007), 71–120; see also Gennaro Chierchia, Logic in Grammar (Oxford: Oxford University Press, 2013). 
  9. This argument is modeled on one made in the literature about Negative Polarity Licensing, which has been studied with brain imaging techniques. See, for example, Heiner Drenhaus et al., “Processing Negative Polarity Items: When Negation Comes Through the Backdoor,” in Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives, eds. Stephan Kepser and Marga Reis (Berlin: Mouton de Gruyter, 2005), 145-165.