Font Size: a A A

Word segmentation, word recognition, and word learning: A computational model of first language acquisition

Posted on:2010-07-12Degree:Ph.DType:Dissertation
University:Northwestern UniversityCandidate:Daland, RobertFull Text:PDF
GTID:1448390002988816Subject:Language
Abstract/Summary:PDF Full Text Request
Many word boundaries are not marked acoustically in fluent speech (Lehiste, 1960), a fact that is immediately apparent from listening to speech in an unfamiliar language, and which poses a special problem for infants. The acquisition literature shows that infants begin to segment speech (identify word boundaries) between 6 and 10.5 months (Saffran, Aslin, & Newport, 1996; Jusczyk, Hohne, & Baumann, 1999; Jusczyk, Houston, & Newsome, 1999; Mattys & Jusczyk, 2001; Bortfeld, Morgan, Golinkoff, & Rathbun, 2005) although they possess minuscule receptive vocabularies at this age (Dale & Fenson, 1996). Thus, word segmentation largely appears before and supports word learning (Aslin, Woodward, LaMendola, & Bever, 1996; van de Weijer, 1998; Brent & Siskind, 2001; Davis, 2004), rather than the other way around. These results raise several further questions. How do infants begin to find word boundaries in speech when they don't know most of the words they hear? How are word segmentation, word recognition, and word learning linked in development? I propose DiBS -- x *Di*phone-* B*ased *S*egmentation -- as a computational model of word segmentation. The core idea of DiBS is to recover word boundaries in speech based on the immediate phonotactic context, by estimating the probabilities of a word boundary within every possible sequence of two speech sounds (diphone, e.g. [ba]). As a proof of concept, a supervised DiBS model is tested on English and Russian data, yielding a consistent pattern of high accuracy with some undersegmentation. Next, a learning theory is developed, by which DiBS can be estimated from information that is observable to infants, including the distribution of speech sounds at phrase edges and any words they have managed to learn; these models achieve superior segmentation relative to other prelexical statistical proposals such as segmentation based on Saffran et al's (1996) transitional probability. Finally, this learning model is integrated with a model of lexical access and word-learning to form a full bootstrapping model, which achieves a relatively high degree of success in word segmentation, but only partial success in word learning. The successes and failures of this model are discussed, as they highlight the need for additional research on wordform learning.
Keywords/Search Tags:Word, Model, Speech
PDF Full Text Request
Related items