Chestnut-crowned babblers (Pomatostomus ruficeps) produce a bitonal flight call as they approach their nest and a tritonal prompt call when feeding chicks in the nest. Sabrina Engesser et al. have demonstrated that the tones within these calls are perceptibly distinct within calls, perceptibly equivalent across calls, and meaningless in isolation, conveying no functionally relevant information.1 Both calls are combinations of two naturally occurring notes, A and B, differentiated by pitch contour: flight call AB and prompt call BAB. In a series of experiments, Engesser et al. switched the elements between the AB and BAB combinations. Babblers were still able to discriminate between the two types of calls despite these changes.2 A flight call composed from the elements of a prompt call is still interpreted as a flight call and a prompt call made using the elements of a flight call is still heard as a prompt call. When played in isolation, tones A and B did not elicit any specific response.
At first glance, this capacity may seem comparable to the way humans form meaningful words from meaningless phonemes. The apparent similarities are suggestive. Studying this phenomenon in birds may, in fact, help us understand how such a capacity first evolved in humans. After all, it is this combinatorial capacity, together with the capacity to combine words into phrases, that constitutes the defining trait of human language—sometimes referred to as duality of patterning.3 Upon closer inspection, a series of differences emerges. The organizational principles governing bird calls appear unlike the phonemic organization of words in natural languages, a process that is primarily based on computational efficiency.4 In addition, birds lack an operation analogous to the recursive procedure that builds phrases from words. There is no duality of patterning. This should come as no surprise. In human beings, the externalization of language, whether by speaking or signing, is ancillary. What is fundamental is the capacity to merge linguistic elements drawn from a lexicon, in the way that rabbits and run are merged into a single set as {rabbits, run}.5 In evolutionary terms, this basic property seems to have emerged recently and it seems to have emerged abruptly.6 Babblers are compelled to use only one of four bitonal (AA, AB, BA, BB) and eight tritonal (AAA, AAB, ABA, BAA, BBA, BAB, ABB, BBB) combinations. Their vocalizations are fixed and directly linked to specific stimuli in the bird’s immediate natural habitat. A spontaneously composed BA flight call, signaling, say, departure from the nest, lies beyond the bird’s brain.
No goodbyes.
By way of comparison, consider the combinatorial capacity of a human language. This discussion will be restricted to three phonemic consonants—p, t, and k—and four vocalic phonemes—i, e, a, and o—in Dutch consonant-vowel-consonant words. From a total of thirty-six possible combinations, only tep, tit, tat, tet, tek, kep, and ket go unrealized. They remain available for future use. Birds must make do with what they have. Babbler calls are unlike human languages in several other respects. In calling out, birds use a finite lexicon and linear order. The basic combinatoric operation is concatenation over sequences of tonal elements. There are no freely generated bird calls. For all we know, flight and prompt calls could be stored in avian memory as units with no need for phonemic composition. What is more, human beings recognize phonemes mainly on the basis of formant transitions, as when the vocal tract vibrates energetically in the passage from a closed to an open vocal tract.7 The difference between /bu:m/ (“boom”) and /du:m/ (“doom”) depends on the formant transitions to the vowel following the consonant rather than the acoustic differences between /b/ and /d/. Not so for birdcalls.8 In English, the bilabial plosive “p,” dental plosive “t,” and the round back vowel “o” form distinctive phonemes that are meaningless in isolation, but produce meaningful words in different arrangements: “pot” (/pɔt/), “top” (/tɔp/), and “opt” (/ɔpt/). Bird brains hear the A in AB and BAB as alike; but the occurrences of /t/ in “pot” and “top” are not perceptibly equivalent for human brains. The phoneme /t/ in “top” is aspirated, but lacks aspiration in “pot” and may be pronounced without audible release. Substituting /ɔ/ in “pot” for /ɔ/ in “top” may result in “top” being misidentified as “pop.” This is due to the acoustics of coarticulation, a characteristic that is typical of human speech.
The word-like elements of a natural language are built from a sequence of syllables composed of phonemic segments organized in onsets and nuclei. The phonemic transcription of the word “phoneme” as /fəʊni:m/ reveals an internal structure: the onset /f/ and nucleus /əʊ/ in the first syllable, and the onset /n/ and nucleus /i:m/ in the second.9 Unlike syntax, phonology is not recursive. There are no syllables nested inside other syllables.10 Phonology is iterative and allows repeats.11 Even so, a strong generative capacity must be invoked to explain syllable structure.12 Why does Dutch use stol, stool, and stolp, but not stoolp? Evidently, there is no categorical ban on “lp.” The asymmetry must follow from some hierarchical syllable structure, which permits only two tail-end rimals: VV, VC. It follows that stoolp is ill-formed because “ool” is ruled out as a rime and “lp” is excluded as a syllable onset, as in *lpot. The “ol” and “p” of stolp satisfy respective conditions on rimal and onset structures, as in “stol” and “pot.”13
In contrast to birdcall communication, the normal use of human language is creative and unbounded. There is no nonarbitrary limit to the length of expressions or the depth of embedding—We all think that John believed that his sister may have thought that… It is neither determined by stimuli—You said what?—nor random—Hell’s drop cigaretting wigwam …—but coherent and appropriate—If elected, I will not serve.14 Babbler vocalizations are a limited, fixed, stimulus-bound repertoire of calls that are involuntary and controlled by instinct. The link to specific stimuli means that they are more likely to be governed primarily by conditions of communicative efficiency.
They get the job done.
In this regard, two aspects of babbler calls play a notable role in enhancing communication. First, the most frequently used birdcalls tend to be short, possibly reflecting a principle of least effort for efficient communication.15 Less is more. Second, distinctive calls are maximally distinct. Bigger is better. A bird communicating its whereabouts during flight will need a more frequently used call than a feeding prompt. Flight calls should be shorter than prompt calls.16 And so they are. This is reflected in their bitonal and tritonal composition, respectively. Maximal distinctiveness entails maximal tonal differences within and across calls: bitonal calls must be AB or BA, but not AA or BB; tritonal calls, ABA or BAB, but not ABB, BBA, BAA, or AAB. Only two combinations are optimally scattered: AB (flight) and BAB (prompt); BA (flight) and ABA (prompt). The call BAB, it should be noted, contains AB but begins with a contrasting onset.
In birdcalls, ease of communication seems to prevail over computational efficiency. The reverse is true for natural language.17 This remains the case even though word length and frequency seem to be in accordance with Zipf’s law, which pegs the frequency of a word inversely to its length.18 Although data in vocabulary studies can be approximated by a Zipfian distribution,19 the law itself does not specifically apply to language. Communicative efficiency cannot be inferred from Zipf’s law and conformity to it is essentially meaningless.20 If rank-frequency distributions deviated from Zipf’s law, such a deviation from predicted behavior would be a positive result conveying something meaningful about word choice.21 A recent revision of Zipf’s law by Edward Gibson et al. holds that “[t]he most communicatively efficient code for [word] meanings is one that shortens the most predictable words—not the most frequent words.”22 Not so. Charles Yang et al. has shown that the claim of communicative efficiency is unsupported.23 The statistical distributions of words in Gibson’s study are replicated by Yang’s stochastic model, which mechanically pairs sounds to their meanings “without any functional considerations.”24 Zipf’s law has been incorrectly interpreted as indicating lexical efficiency. In any case, it has no relevance to the far more important notion of structural efficiency. The basic properties of human language overwhelmingly demonstrate the prevalence of computational over communicative efficiency.25 Why has John left the room? Who knows? But why has “why” been dragged to the front of the sentence from its expected grammatical position at the rear? That is a matter of structure dependence.
Bird communication does not share the capacity of human language to freely generate new meanings from meaningless elements. The differences between the two systems are qualitative and abrupt. Since birds lack a recursive operation for the creative use of call vocalizations, evolutionary and comparative biological studies of avian and human communication will always remain a problematic enterprise.26 Such studies are further hindered by the fact that humans are the only extant species of the genus Homo to possess discretely unbounded language. Still, some significant similarities between human speech and birdsong have recently come to light. The sensorimotor systems for producing language or birdsong require similar linear arrangements of differently organized structures.27 These appear to be derived from transcription factors for convergent neurogenetic organization in analogous brain regions that are involved in auditory–vocal imitation learning, perception, and production.28 It is plausible that this convergence,29 which is absent in both our closest primate relatives and non-vocal learning birds, may have contributed to externalized language in human evolution.30