A Platonic Paradox

John Colarusso

To the editors:

It is a privilege to have been asked to contribute a short response to “The Recovery of Case,” which discusses the contribution that the late Jean-Roger Vergnaud made to case theory by means of a letter to Chomsky. Early in my career, roughly in the late 70s, (1979?), perhaps slightly after Vergnaud had sent his letter in 1977, I myself sent a letter to Chomsky. I shall be brief and informal.

I came into linguistics with two degrees in philosophy, in which discipline I had studied a fair bit of logic. Because of this background the theory of the language acquisition device (LAD) as conceived by Chomsky at the time seemed flawed to me. Chomsky argued more or less that the LAD was a sort of super grammar, one that took a language input (from parent to child) and formulated the rules (grammar) for that language on the basis of (super) grammatical rules. While this seemed a reasonable position I knew that it could not work; Chomsky had stumbled into the halting problem.

Chomsky’s LAD was a device that matched grammatical rules to an input, in effect positing an “algorithm,” a coherent way of establishing a set of rules for a language “input.” One might frame it as a question, (1):

Q_LAD: Does G, a grammar, applied to an utterance, U, have a derivation, Δ?

This may be put into a close association (an isomorphism, one for one match) with the “halting” question, (2):¹

Q_W: Does M applied to a ‘M’ eventually stop on ‡?

Here M is a “Turing machine,” a formal mechanism than can calculate (derive) anything that can be derived by an algorithm, and ‘M’ is some string, some formal object that must be derived by some procedure (read “algorithm”). The diamond, ‡, is the “halting symbol,” signifying that M has in fact reached an end, that is, that it has in fact derived W.² As Crossley et al. show this is an unsolvable problem,³ an example of a self-referential paradox, much as in set theory with the set that contains all and only those sets that do not contain themselves, Russell’s paradox.⁴ Such self-referential problems have a “weird” feeling to them, almost a silly quality, but their upshot is simple and inescapable: restricting a system so that it does not fall into a self-referential feedback loop is extremely hard to do, requiring major and often counter intuitive modifications.

Almost any book on computation and automata theory treats this problem, perhaps one of the clearer sources being Davis et al.⁵ The best is still Crossley et al., because in a few pages they extend the problem to a universal Turing machine, and then show its applicability to word transformations and then to predicate calculus.⁶ The conclusion is: “There is no algorithm that, given a program of S and an input to that program, can determine whether or not that given program will eventually halt on the given input.”⁷

Analogously, there is no LAD (super-grammar, a linguistic algorithm) that can tell beforehand that a given grammar will produce (derive, halt on) a given utterance. This was the thrust of my letter to Noam Chomsky.

To elaborate, one might envision the theory as that in (3 and 4) by recasting (1) with G as a function whose domain (input) is the set of utterances.

“Early LAD” Theory
- Q_lad: G (U_i) > (ρ, Δ, S_i),
- where ρ is a semantic reading, Δ a derivation (series of rule generated trees) and S is a grammatically acceptable sentence

(The semantic reading and the derivation both have internal structure where ρ and Δ interact, but I ignore these complications here.) If one again takes the LAD as a super grammar, one has (4)

Super Grammar, SG, LAD
- Q_SG: SG({U}) > (G (U_i) > (ρ, Δ, S_i))

SG, the super grammar, operated upon a set of utterances (degenerate, etc.), {U}, to generate, in effect, a grammar, G, which itself then generated those utterances, converted now into ideal sentences, {S}. This configuration resembled that seen in proof theory or the simple layout of an axiomatic system. Not only does the halting problem (2) tell us that (4) is unobtainable, the configuration in (4) gives off the whiff of an infinite regress, (5):

Infinitely many SGs
- Q_SG’: SG’({U}) > (SG({U}) > (G (U_i) > (ρ, Δ, S_i))), etc.

My point in writing to Chomsky was to point out the difficulties faced by the LAD as he envisioned it at the time, difficulties that could only be overcome if the choice of possible grammars was highly constrained. The set of constraints emerged shortly as the theory of principles and parameters. At a talk in 1981 (if I recall) at the University of Toronto, Chomsky alluded to the difficulties I had outlined, admitting that he had not appreciated the “logical difficulty of the problem,” but never mentioned my name.

To see the relevance to the math one merely substitutes in (4) or (5) {θ} for (ρ, Δ, S_i), and Ax(iom) for G, SAx for SG, etc., as in (6). The effort in (6) is open to leftward infinite regress just as is that in its grammatical analog, (5).

Axiomatic system
- Ax > { θ },
- Ax a set of unordered axioms, and { θ } a partially ordered set of theorems

In (7) SAx is the (unordered) set of super-axioms, which then generate an axiom set, Ax, (technically both SAx and Ax should have set brackets around them), which in turn generates the (partially ordered) set of theorems, { θ }. One should include a set of rules of inference, but I ignore this detail.

Super Axiomatic system
- SAx ({ θ }) > (Ax > { θ })

While axioms are unordered and are ideally independent of one another, they are formally statements that differ from theorems only in that they lack predecessors in a chain of proof. The lack of predecessors is technically not true of axiom schemata, that is, of recursive axioms, such as in arithmetic, Peano’s axioms being a simple example, which have repetitions of themselves as predecessors. In effect axioms serve to put a cap on one “end” of a body of theorems, let us say the left hand end, while the right hand end may be open (infinite). The super axioms violate this basic function of a set of axioms, as depicted in (7), since they themselves may be open to a derivation from a set of super super axioms, (8).

Super Super Axioms
- SSAx({ θ }) > (SAx ({ θ } > (Ax > { θ })))

The effect in (8) may be thought of as an axiom schema, an iterative process that extends the theory infinitely out toward the left hand direction. It is quite alright to extend a theory out to the right, in effect to have an infinite set of theorems generated by a recursive axiom schema (or other means), since most mathematical theories are open ended (as is the case too with language for precisely the same reason as an iterative set of rules).⁸ It is easy to see, however, that the process in (8) may be again iterated without end to produce an infinite regress out toward the left.

The infinite regress (iteration) in (8) shows that within mathematical efforts a choice of axioms for a body of theorems (hypotheses, really, until the proofs and axioms are posited) was a choice made from intuition, an act or art if you will. The theorem is little mentioned in the literature, I suspect, because it is sort of a logical embarrassment; math is not entirely rational! One might turn to the Axiom of Choice, as it is termed in Zermelo – Fraenkel set theory, where efforts to capture this axiom in intuitively acceptable and logically coherent form have created a sub-industry within set theory.⁹

Remarkably at roughly the same time as Vergnaud and I were writing our letters Harvey Friedman proposed a technique that would “justify” a choice of axioms as most “efficient” for a body of theorems, a technique called “reverse arithmetic.”¹⁰ In effect a set of axioms is shown to generate a body of theorems, then that set is shrunk by removing one axiom, rendering its generative capacity null and void. The conclusion is that the full set was the necessary and smallest (most efficient) set of axioms for the body of theorems. This left the original choice of axioms, however, still an act of art, though it did vindicate the artist (mathematician).

The resulting theory of principles and parameters has stood the test of decades. Logically it is necessary and so will always be part of our conceptual toolkit as linguists. I note, however, a number of problems with the current paradigm.

First, the simple on-off vision of parameters seems inadequate. I address specifically the parameter of polysynthesis, that in which a sentence is phonologically and even morphologically identical to a word. I am fond of this parameter since, having grown up in Newark, New Jersey, ([nrk] for those of us so chosen by fate) I spoke a polysynthetic dialect of English. Note the examples in (9).

Newark English circa 1955
1. Let us go eat [ lskwít ]
2. It is too early to eat [ (t)stwŕlitwit ]

While these examples of polysynthesis meet the simple on-off canonical form of this parameter, the form in (10), from the Caddoan language Wichita, does not.¹¹

Wichita VP-polysynthesis
- wá:cɁarɁa kiya:kíriwa:cɁárasarikìtàɁahí:rikss niya:hkʷírih
- wa:cɁarɁa 'squirrel'
- kiya 'quotative' + a...ki 'aorist' + a 'preverb' + riwa:c 'big (quantity) + Ɂaras
- 'meat' + ra 'collective' + ri 'portative' + kita 'top' + Ɂa 'come' + hi:riks
- 'repetitive' + s 'imperfective', na 'participle' + ya:k 'wood' + r 'collective' + wi
- 'be upright' + hrih 'locative'
- 'The squirrel, by making many trips, carried the large quantity of meat up into the top of the tree, they say.'

Note that the subject, “squirrel” and a locative adjunct “top of the tree” stand as free words, while the verb and the direct object argument form a single unit, along with various and sundry morphemes. In short, and quite unsurprisingly, the parameter should have the range of choices suitable for the complexity of the domain that it governs. In polysynthesis a binary choice fails to reflect the complexity of the sentential domain, which clearly can offer a range of choices. Growing up in Nrk we did not worry about such subtleties.

But there is yet a deeper problem that arises within the ideology of minimalism. It is a tenet of the minimalist effort that human language is in some way an ideal system, one that is efficient, or perhaps even perfectly suited for our cognitive and social needs. Certainly, however, if choices of parameter settings can be made, binary or otherwise, then some apex of efficiency or perfection is not logically possible; the various choices would represent various forms of perfection. Perfection, like pregnancy, is an all or nothing matter. This might be a simple observation, but it is as logically damning as it is simple. A more modest but coherent claim would be that our language ability reflects a range of efficient strategies, some perhaps interdependent, that meet our cognitive and social needs.

Finally, there was a purpose to my casting the grammar as a function, see (3) above. In (3) the grammar acts on an input, an utterance, to assign to it sense and grammatical coherence so that it is recognized as a sentence, a formal and cognitive object, if you will, that matches the physical utterance, the last being an object out there in the world, at least in the cultural world. The process in (3) depicts what is called parsing. It does not, however, depict production. The difference between the two processes is depicted in (11). The same grammar, G_out, is producing an utterance, U, after operating on an intention or a cognitive state of some sort that is to be expressed, Ψ. The semantic reading drives a derivation, leading to an utterance, hence the elaboration not noted in (3).

Production and Parsing
1. Production
  - G_out(Ψ) > ((ρ > Δ) > U)),
  - where Ψ is a cognitive state or intended message
2. Parsing
  - G_in(U) > ((Δ > ρ) > S)),

Current theory assumes the equality in (12), that production and parsing are in some sense equal, but facts suggest otherwise.

Platonic grammar
- G_out = G_in

The S in (3) and (11b) is a coherent and idealized entity, whereas the utterance, U, that emerges from (11a) is incomplete, degenerate, etc. We hear much more cleanly than we speak and this suggests a difference in the very architecture of the language faculty and the grammar(s) it produces. One might take the Platonic route, as I have called (12), and relegate these divergences to the psycho-linguist’s laboratory, but some vital and fundamental aspects of grammar itself may be missing if one does so. After all we have at least two major language areas in the cortex and this alone suggests differences in “machinery,” differences that we should be pondering at a formal level. The two “directions” of grammar may use the same elements, but in a differing order. For example, in parsing, a derivation seems to be constructed from which the hearer infers a semantic reading, (Δ > ρ) whereas in production the semantic reading would seem to demand a derivational (syntactic) structure, ((ρ > Δ). The little arrows, >, hide a welter of logic and pattern and should be examined in detail as well, promising differences in this aspect of the theory as well. Therefore, even formally these two forms of competence appear distinct.

If (12)’s equality is to be true, it must be at a highly abstract level. Hence, I term it a platonic equivalence, but perhaps I should call it a platonic paradox. Despite the differences I have outlined, these are in fact of a formal character and the language dealt with in any concrete instance will be the same language. So, (12) must be true. It is just hard to see exactly how.

John Colarusso

Letters to the Editors

More Letters for this Article