Font Size: a A A

Computational Models Of Children Language Acquicition

Posted on:2013-04-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:B C ZhangFull Text:PDF
GTID:1225330401963128Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Computational models of language acquisition aims to acquire linguistic knowledge computationally. It is the essential part of high quality natural language processing applications. Human beings learn their basic language knowledge during their childhood which is often called the critical period of language acquisition. It is therefore significant important to develop computational models of children language acquisition for building better computational models of language knowledge acquisition. Developing computational models of children language acquisition, especially models with various cognitive mechanisms, is also an effective way to investigate and evaluate various cognitive hypotheses for children language acquisition. Significant researches have been carried out to build computational models of children language acquisition in diverse area such as Computational linguistics, Cognitive Psychology and Developing Linguistics.However, the existing computational models of children language acquisition still have some drawbacks. One is that there is no well-recognized measurement for evaluating lexical category acquisition models. The second is the previous algorithms applied to the lexical category acquisition task require a pre-defined number of categories. The third is that existing syntax acquisition models seldom take the long distance dependency of words into consideration. Finally, I think these models have not introduced enough results from cognitive science.This thesis focuses on children corpus construction computational models of children lexical category acquisition and syntax acquisition, to resolve the above drawbacks of the existing children language acquisition models. The main contents and contributions of this thesis are shown as followed:First, we obtains a Chinese character based corpus of children speech and Child-Directed Speech (CDS), and makes statistical and contrast analysis among Children Speech, CDS and an adult corpus on three aspects of character level, word level and sentence level. The generating result of children speech and the understanding capacity of CDS reflect the language ability of children; Children Speech and CDS have remarkable difference with adult language. Therefore, constructing Children Speech and Child-Directed Speech corpora is the foundation of study children language acquisition. The computational models of children language acquisition is based on children corpus, research concerned with child language acquisition must be trained and evaluated on dedicated corpora. Therefore, this thesis obtains a Children Speech and Child-Directed Speech corpora by transliterating as the first step of carrying out the study, annotating and proofreading the Chinese corpora in CHILDES, the biggest children language corpora in the world.Second, for lexical category acquisition models, the thesis has made great effort on evaluation metric and computational models.The thesis proposes a new metric called Cohesivity for evaluating performance of lexical category acquisition task, which meets three criteria comprehensively:informativeness, diversity and purity. Experiments demonstrate the new metric Cohesivity is feasible and effective.The thesis employs Dirichlet Process Mixture Models (DPMMs) and Affinity Propagation algorithm (AP) for lexical category acquisition, which do not require predefining the number of the categories like previous works. Moreover, based on the cognitive mechanism that other cognitive channel besides language can provide prior information for language acquisition, the thesis utilizes manually-tagged seed words to simulate the prior information from other cognitive channel, and constructs a semi-supervised AP algorithm. Experiments show good performance brought by the prior information. Third, for syntax acquisition models, the thesis proposes a multi-staged syntax acquisition model based on the cognitive mechanism that children learn syntax from simple to complex, from concrete to abstract. Experiments show the model is effective.The syntax acquisition model is composed of three stages. It acquires continuous concrete syntactic structure in the first stage, in this stage, only the adjacent structure that consisted of terminal symbols is considered; the model acquires long distance dependent syntactic structure in the second stage, in this stage, it still only consider terminal symbols, seeking for discontinuous syntactic structure; the model acquires hierarchical syntactic structure in the third stage, in this stage, the model acquire mixture syntax constituent of terminal symbol and nor-terminal symbols, and finally achieves the syntax acquisition.Fourth, based on the cognitive fact that lexical categories and syntax structures are staged incrementally increasing, the thesis trains the proposed lexical category acquisition model in a staged fashion, and to introduce a lexical category based syntax acquisition framework the thesis combines the lexical category acquisition model to the multi-staged syntax acquisition model. Then the model is applied in language generation and statistical and contrast analysis is made among generated language and child language, child-directed language. Manual evaluation is made for assessing the generated language. The experiments demonstrate that the framework is of cognitive rationality, and combining with lexical category information can effectively improve the performance of syntax acquisition, the generated language is of fluency and intelligibility.
Keywords/Search Tags:children language acquisition, computational model, cognitive mechanism, Chinese character based children speech corpus, staged syntax acquisition model
PDF Full Text Request
Related items