Word segmentation, word recognition, and word learning: A computational model of first language acquisition

Posted on:2010-07-12

Degree:Ph.D

Type:Dissertation

University:Northwestern University

Candidate:Daland, Robert

Full Text:PDF

GTID:1448390002988816

Subject:Language

Abstract/Summary:

Many word boundaries are not marked acoustically in fluent speech (Lehiste, 1960), a fact that is immediately apparent from listening to speech in an unfamiliar language, and which poses a special problem for infants. The acquisition literature shows that infants begin to segment speech (identify word boundaries) between 6 and 10.5 months (Saffran, Aslin, & Newport, 1996; Jusczyk, Hohne, & Baumann, 1999; Jusczyk, Houston, & Newsome, 1999; Mattys & Jusczyk, 2001; Bortfeld, Morgan, Golinkoff, & Rathbun, 2005) although they possess minuscule receptive vocabularies at this age (Dale & Fenson, 1996). Thus, word segmentation largely appears before and supports word learning (Aslin, Woodward, LaMendola, & Bever, 1996; van de Weijer, 1998; Brent & Siskind, 2001; Davis, 2004), rather than the other way around. These results raise several further questions. How do infants begin to find word boundaries in speech when they don't know most of the words they hear? How are word segmentation, word recognition, and word learning linked in development? I propose DiBS -- x *Di*phone-* B*ased *S*egmentation -- as a computational model of word segmentation. The core idea of DiBS is to recover word boundaries in speech based on the immediate phonotactic context, by estimating the probabilities of a word boundary within every possible sequence of two speech sounds (diphone, e.g. [ba]). As a proof of concept, a supervised DiBS model is tested on English and Russian data, yielding a consistent pattern of high accuracy with some undersegmentation. Next, a learning theory is developed, by which DiBS can be estimated from information that is observable to infants, including the distribution of speech sounds at phrase edges and any words they have managed to learn; these models achieve superior segmentation relative to other prelexical statistical proposals such as segmentation based on Saffran et al's (1996) transitional probability. Finally, this learning model is integrated with a model of lexical access and word-learning to form a full bootstrapping model, which achieves a relatively high degree of success in word segmentation, but only partial success in word learning. The successes and failures of this model are discussed, as they highlight the need for additional research on wordform learning.

Keywords/Search Tags:

Word, Model, Speech

Related items

1	Improving Word Vector Model With Part-of-Speech And Dependency Grammar Information
2	Research On Enhanced Word Embedding Learning Model With Fusion Of Part-of-Speech And Position Information
3	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model
4	Applied Research On Specific Word Chinese Speech Recognition System
5	How does acoustic variability in speech affect infant word recognition and word learning
6	Study On Speech Wake-up Word Detection Methods Based On Deep Learning
7	Research On Chinese Speech Retrieval Technology Based On Word Fragment And Lattice
8	The Effect Of Part Of Speech On Chinese Word Segmentation
9	The Uygur Natural Spoken In Multiple Pronunciation Word Modeling And Performance Analysis
10	Design And Implementation Of Word And Speech Libraries And NHMM Algorithm In Chinese Speech-to-Text Conversion