Mechanisms of Early Language Acquisition: Computational modeling and human data
Date: February 24, 2011
Location: Utrecht University (Sweelinckzaal, Drift 21, 3512 BR Utrecht)
Experimental studies with infants and children show what types of linguistic knowledge are acquired, and at what age. Computational models provide possible explanations of how learners might acquire such knowledge, given the input and mechanisms available to the learner. This workshop aims to bring together researchers who study early language acquisition from various computational and (psycho-)linguistic perspectives. A particular focus of the workshop is on the mechanisms involved in language development during infancy, and the challenges in explaining infant language development through the use of computational models.
The workshop will take place on Thursday, February 24 from 14:30 – 18:45 at Utrecht University (Sweelinckzaal, address: Drift 21, Utrecht).
14:30-15:15 Paul Boersma (University of Amsterdam) – How virtual children change their parents’ sound system
15:15-16:00 Walter Daelemans (University of Antwerp) – Implicit schemata and categories in memory-based language acquisition and processing
16:20-17:05 Louis ten Bosch (Radboud University Nijmegen) – The computational modeling of language acquisition: Focus on word discovery
17:05-17:50 James Morgan (Brown University) – Interactive learning: A role for the developing lexicon in phonetic category acquisition
18:00-18:45 Frans Adriaans (Utrecht University) – The induction of phonotactics for speech segmentation: Converging evidence from computational and human learners
How virtual children change their parents’ sound system
Paul Boersma, University of Amsterdam
When we model the acquisition of a sound system by computer with learning algorithms that optimize perception or production over multiple levels of representation, the virtual children display several effects that can also be observed in real human sound change over the generations: push and drag chains, merger, and circular shifts.
Implicit schemata and categories in memory-based language acquisition and processing
Walter Daelemans, University of Antwerp (joint work with Antal van den Bosch)
Memory-based language processing (MBLP) is an approach to language processing based on exemplar storage during learning and analogical reasoning during processing. It has been used to explain results in human acquisition and processing, and has been applied to tasks in language technology, where it is often a successful competitor for more mainstream machine learning and statistical methods (Daelemans & Van den Bosch 2005, 2010 for overviews) . From a cognitive perspective, the approach is attractive because it doesn’t make any assumptions about the way abstractions (like rules or schemata) are built, and doesn’t make any a priori distinction between regular and exceptional exemplars, allowing it to explain fluidity of linguistic categories, and irregularization as well as regularization in processing. Schema-based behavior can be explained in MBLP as the effect of analogy and exemplars in memory. Similarly, category formation can be explained as the result of bottom up implicit clustering of exemplar feature values. Using morphology as an example domain, we will show how these processes of schema and category formation arise in a memory-based framework.
Walter Daelemans & Antal van den Bosch. Memory-based language processing. Cambridge University Press. 2005.
Walter Daelemans & Antal van den Bosch. Memory-Based Learning. In Alexander Clark et al. (eds) The Handbook of Computational Linguistics and Natural Language Processing (Blackwell Handbooks in Linguistics) forthcoming, 154-179, 2010.
The computational modeling of language acquisition – focus on word discovery
Louis ten Bosch, Radboud University Nijmegen
The detection of recurrent word-like units in the speech signal is one of the basic steps for L1 acquisition by a young infant. Aslin and others have shown that statistical properties in the input may play an important role to detect potential boundaries between words. In the research by Smith and Yu and others, it is suggested that infants are able to exploit cross-situational statistical patterns in cross-modal associations.
Over the last couple of years the ACORNS computational model (www.acorns-project.org) has shown that it is able to replicate a number of findings from the acquisition literature. For example it has shown that the recognition of novel speakers is improved after being exposed to a multiple speakers. The model is able to hypothesize and strengthen internal representations of word-like entities on the basis of associations between real speech data and real images.
In this presentation, I’ll present this computational model, discuss the cognitive plausibility of its architecture, and a number of experiments concerning the robustness of its internal representations
Interactive learning: A role for the developing lexicon in phonetic category acquisition
James Morgan, Brown University
Infants learn to segment words from fluent speech during the same period as they learn native language phonetic categories, yet accounts of phonetic category acquisition typically ignore information about the words in which sounds appear. A Bayesian model (Feldman, Griffiths, Goldwater, & Morgan, submitted) illustrates how feedback from segmented words might constrain phonetic category learning by providing information about which sounds occur together in words. Simulations show that this type of information can successfully disambiguate overlapping English vowel categories, leading to more robust category learning than distributional information alone. Experimental evidence from adults and infants shows that listeners assign continuum endpoints to different categories more often when they hear the sounds occurring in distinct lexical contexts than when they hear them occurring interchangeably in the same set of lexical contexts. Additional modeling investigates how information from lexical recognition can be fed back to make phonetic category learning even more robust and accurate.
The induction of phonotactics for speech segmentation: Converging evidence from computational and human learners
Frans Adriaans, Utrecht University
During the first year of life, infants start to learn various properties of their native language. Among these properties are phonotactic constraints, which state the permissible sound sequences within the words of the language. Such constraints guide infants’ search for words in continuous speech, thereby facilitating the development of the mental lexicon. An intriguing problem is how infants are able to acquire knowledge of phonotactics. In my dissertation, I propose a computational model of phonotactic learning, which is based on psycholinguistic findings. The model connects two learning mechanisms: statistical learning and feature-based generalization. Using these mechanisms, phonotactic constraints are induced from transcribed utterances of continuous speech, and are subsequently used for the detection of word boundaries. The model is tested in various empirical studies involving computer simulations on transcribed speech data, computer simulations of human segmentation behavior, and artificial language learning experiments with human learners. In this talk, I will present some of the findings that support the idea that phonotactic constraints are learned from continuous speech, and facilitate the detection of words in the speech stream.