The Cultural Evolution of Language

March 10th, 2010  |  Bayes Categories: Linguistics

One of the major shifts in thinking about language came in 1990, when Steven Pinker and Paul Bloom published their groundbreaking paper: Natural language and natural selection. In it, they argue natural selection was the central process in shaping the biological structures underpinning language. Since then, the field of language evolution has blossomed into a truly multidisciplinary subject. Yet now, I believe we are in the process of undergoing another paradigm shift: incorporating cultural evolution.

For some features, particularly the physical capacity to produce and receive multiple vocalizations, there is ample evidence for specialisation: a descended larynx, thoracic breathing, and several distinct hearing organs. Given that these features are firmly in the domain of biology, it makes intuitive sense to apply the theory of natural selection to solve the problem: humans are specially adapted to the production and reception of multiple vocalizations. Yet Pinker and Bloom’s argument is found somewhat wanting when extended to incorporate the notion that natural selection shaped specialised mental organs, or modules, for acquiring language. First and foremost, the notion of a putative language acquisition device (commonly referred to as LAD) is not an established fact: rather, it’s derived from Noam Chomsky’s arguments from the poverty of the stimulus (POTS) and assumptions that all languages are essentially the same in structure, but differ in their sound systems and vocabularies.

As such, under the stewardship of Pinker, Chomsky and others, the origin, evolution and acquisition of language is primarily seen as a biological question to be answered. Whilst it is certain that biology plays a role in the evolution of language, its exact purpose is still contentious in light of new research emerging from theories into cultural evolution. A notable instance came at the 2009 CogSci conference, where some of the leading researchers into the cultural evolution of language met up at a symposium, namely: Nick Chater (philosophy), Thomas L. Griffiths (Bayesian analyses), Simon Kirby (evolutionary psycholinguistics) and Morten H. Christiansen (molecular genetics). Each of these individuals are key influences on my own thinking in regards to language evolution (Simon Kirby was formerly my course supervisor at Edinburgh), and I think it is worthwhile to dedicate a few paragraphs discussing their ideas.

Cultural Induction and Language Acquisition (Chater)

Consider the following question posed by Chater & Christiansen (in press): “Suppose that some natural process yields the sequence 1, 2, 3… How does it continue?” If we put aside our own, highly coordinated learning mechanisms, then the sequence may: oscillate (1, 2, 3, 2, 1, 2, 3, 2, 1…), become stuck (1, 2, 3, 3, 3, 3…), exhibit a Fibonacci structure (1, 2, 3, 5, 8…). In fact, given the scarcity of data, we can yield an infinite array of answers. This is an example of N-induction, which may be described as “induction about the natural world, [where] data is generated by some external source, and the learner attempts to predict how it continues”.

However, if you happen to be a reader of this site, the overwhelming likelihood is that the continuation of 1, 2, 3 would be: 4, 5, 6… This is an example of C-induction, where the objective is to coordinate your predictions with other learners to produce the same results. But what’s C-induction’s relevance to the acquisition of language? As Chater explains:

Thus, in language acquisition, children receive partial linguistic input, and must generalize to many new linguistic structures – but the standard of correctness is to generalize in the same way as other learners. To the extent that learners have the same biases and prior experience, this dramatically simplifies the learning problem, because their generalizations will typically agree. More generally, language evolution itself can be viewed as the accretion of successive generalizations upon which learners converge.

Uncovering Inductive Biases through Cultural Evolution (Griffiths)

Looking at the relationship between the inductive biases of individual learners and the outcome of cultural evolution is something being primarily explored through computational modelling. Specifically, Griffiths argues that by modelling “learning as Bayesian inference [it] provides the opportunity to explore this relationship, making the inductive biases of learners transparent through a prior distribution”. Here, the role of learners is to select a hypothesis h on the basis of its posterior probability when exposed to data d:

P(d|h) provides a statistical likelihood of the data d being produced under a certain hypothesis h, with P(h) equalling the prior probability of each hypothesis. When applied to models of language and iterated learning (see below), both hypotheses are considered to be the set of possible grammars, whilst the data consists of sets of utterances required to induce a language. Importantly, the prior probability distribution over grammars is the learning bias, which may be domain-specific or domain-general. A critical component of Bayesian learning, and still a point of contention, stems from the role of prior biases and how much influence they exert over a language’s evolutionary trajectory.

Language Evolution through Iterated Learning (Kirby)

Much of the literature regarding Iterated Learning focuses on a computational modelling approach, where “the central idea behind the ILM [Iterated Learning Model] is to model directly the way language exists and persists via two forms of representation” (Kirby & Hurford, 2002, pg. 123). These two forms consist of an I-Language (the internal representation of language as a pattern of neural connections) and an E-Language (the external representation of language as sets of utterances). This cycle of continued production and induction is used to understand how the evolution of structure emerges from non-linguistic communication systems and how language changes from one form into another.

To briefly summarise, these models contain a single agent who is taught an initial random language (consisting of mappings between meanings and signals). The output of the agent is then used to teach the next generation, and so on. After numerous generational turnovers of teachers and observers, some of these models provide an intriguing insight into the emergence of linguistic phenomena such as compositionality and regularity.

A common theme running through a wide array of these Iterated Learning studies emphasises language as being a compromise between two factors: the biases of learners, and constraints during language transmission. What is perhaps fundamental to this view is encapsulated in the second constraint: that the transmission is a mediating force in the shaping of language. For instance, Kirby & Hurford (2002) show how the infinite expressivity found in languages is a result of the finite set of data presented during acquisition. With this transmission bottleneck restricting the amount of data presented, learners must generalise in order to learn the data, but not to the extent where the language is one signal for all possible meanings. Tempering maximal expressivity with generalisation provides an adequate explanation for recursive compositionality, without appealing to the need for an intricately specified LAD. As Zuidema (2003) succinctly put it: “the poverty of the stimulus solves the poverty of the stimulus”.

These modelling observations are backed up by experiments utilising real human learners. As Kirby notes:

By placing the artificial language learning paradigm within a cultural transmission framework, we can observe the evolution of languages in the laboratory (Kirby, Cornish & Smith, 2008). Results from these experiments show that linguistic structure does indeed emerge from initially random systems, and furthermore that this process is non-intentional. In other words, this cultural process provides “design without a designer” just as biological evolution does.

Genetic Constraints on Cultural Evolution of Language (Christiansen)

Coming full circle and we’re back to what Steven Pinker and Paul Bloom originally discussed: that of the genetic bases underlying language. Whereas previously, language was seen as a highly specified organ, akin to the visual system, new research highlights the importance of domain-general mechanisms in shaping language. The example Christiansen uses is that of sequential learning: here, both sequential learning and language involve “the extraction and further processing of discrete elements occurring in complex temporal sequences”. On the basis of previous simulation work, showing that constraints on sequential learning can shape the trajectory of linguistic structure, a recent molecular genetics study by Tomblin et al. (2007) found:

[...] that common allelic variations in the FOXP2 gene are associated with differences in sequential learning (as measured by a serial-response time task) and language… [suggesting] that FOXP2 influences systems that are important to the development of both sequential learning and language, supporting the hypothesis that language may have been shaped through cultural evolution constrained by underlying mechanisms for sequential learning.

It’s important to note that I’m not necessarily in complete agreement with some of the conclusions coming from this research (a point I’ll pick up in another post). I just wanted to give you an indicator as to the depth of work taking place in examining the cultural evolution of language.

Main citation: Cultural Evolution of Language: Implications for Cognitive Science. CogSci 2009 Conference. PDF Link: http://csjarchive.cogsci.rpi.edu/Proceedings/2009/papers/491/paper491.pdf

Other citations:

N. Chater & M. Christiansen. Language Acquisition meets Language Evolution. Cognitive Science, 2009; DOI: 10.1111/j.1551-6709.2009.01049.x

S. Kirby & J. Hurford. The Emergence of Linguistic Structure: An overview of the Iterated Learning Model. In Angelo Cangelosi and Domenico Parisi, editors, Simulating the Evolution of Language, 2002; 121-148.

Zuidema. How the poverty of the stimulus solves the poverty of the stimulus. In Suzanna Becker and Sebastian Thrun and Klaus Obermayer, editors, Advances in Neural Information Processing Systems 15 (Proceedings of NIPS’02), 2003.

J.B. Tomblin, et al. Association  of  FOXP2  genetic markers  with  procedural  learning  and  language.  Poster presented  at  the  57th  Annual  Meeting  of  the  American Society of Human Genetics, San Diego, CA. 2007.

  • Share/Bookmark
Tags: , , , , , ,

5 Responses to “The Cultural Evolution of Language”

  1. Sid
    March 12th, 2010 at 23:05
    1

    Have human vocal features changed across historical time? (Or, if not historical, at least among languages which we have evidence for.) For example, I know that Proto-Indo-European had laryngeal consonants. Of the Indo-European languages for which we have records, only Hittite seems to have had them. This may have nothing to do with human throats changing, but it’s something I wonder about.

  2. March 13th, 2010 at 00:56
    2

    Yeah, it’s a nice thought. But as far as I know there’s nothing to demonstrate that the Hittites, and other ancient peoples, had different vocal tract morphology to that of current human populations.

    Maybe the reason why all the laryngeal consonants coalesced with the vowels, and subsequently disappeared from present day Indo-European languages, is because the languages themselves underwent selection to become easier to produce and comprehend? Present day humans, for instance, have the ability to produce a wide repertoire of speech sounds, but the actual range of sounds utilised within that repertoire is dependent on which language you speak (assuming you remain monolingual). And even though Saussure’s laryngeal theory is now generally accepted, we’re still not sure as to the exact place of articulation for the laryngeal consonants.

    … Having said that, I remember reading a fascinating language log post about the adaptive evolution of human hearing — Ongoing human evolution for spoken language?. Apparently, there may be genes for hearing that have taken root as recently as 2,000 years ago.

  3. March 13th, 2010 at 01:59
    3

    Thanks.

    On the domain-specific v. domain-general aspects of language, what about the Williams Syndrome people, who aren’t very bright, but can talk your ear off?

  4. March 13th, 2010 at 21:37
    4

    I don’t think those with Williams Syndrome really reveal much about the debate surrounding domain-specificity for language. In fact, we know language processing is distributed across both hemispheres. For instance, prosody is processed in the right hemisphere whilst syntax is processed on the left side.

    The point of contention surrounding domain-specificity is actually in regards to the processing of certain aspects that make up language and its sub-domains. So, a domain-specific argument would say: portion x of the brain is dedicated to processing feature a. Whereas a domain-general argument says: portion x of the brain processes features a,b,c.

    An actual example of this argument in practice is Broca’s area. One set of arguments (which I wrote about here ) posits that Broca’s area is crucial in the processing of hierarchical sequences. A domain-specific case says: only hierarchically organised phrases in language are processed by Broca’s area. However, those coming from a domain-general perspective point towards other, non-linguistic behaviours — such as music, action sequences, tool-use and tool production — that all show instances of hierarchical organisation. The question then being asked is: Does Broca’s Area subserve the processing of hierarchical sequences across many behaviours?

    Teasing out the answer to this question is extremely difficult. And there are those who would even argue that Broca’s area is not involved in hierarchical structure building at all. But that’s a completely different argument.

  5. March 13th, 2010 at 21:49
    5

    A slight addendum: I tend to think of the brain as being composed of domain-general (or independent) regions that are networked in a domain-specific manner. For example, the language network and the tool-use network will share many overlapping regions (e.g. LH Broca’s area), but the way in which each of these individual behaviours are processed involve different activation patterns. I’m not a neuroscientist though, so I might be completely off on this point.

Leave a Comment