Font Size: a A A

Naxi Chinese Bilingual Corpus Construction And Intelligent Research Input Method

Posted on:2014-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y C KongFull Text:PDF
GTID:2268330401473418Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Now Naxi Pictographs in the infiltration of various foreign culture and modern civilization is rapidly variable and extinct. About half a million of people use the Naxi language at present, how to protect ethnic minority languages and its information technology with the help of computer is an important research direction for the processing priority of the current minority languages. Naxi language information processing as the basis of the information work of the Naxi language, which has an extremely important significance for promoting and developing of the Naxi language. There is no integral Naxi Chinese bilingual electronic dictionaries and the shortage of Naxi Chinese bilingual corpus, the work of Naxi Chinese intelligent input method research is still in its infancy. The problem exists in the Naxi language information construction, this paper focus on aspects of the Naxi Chinese bilingual electronic dictionary to build, Naxi Chinese bilingual corpus to build, Naxi Chinese intelligent input method carrying out research and discussion, mainly made the following respects achievements:(1) Firstly, according to the characteristics of the Naxi and grammatical features, we detail information to improve the Naxi pictographs on the basis of collecting and collating Naxi pictograph. Secondly, we have described the producing process of the Naxi pictographs and added more than2000new words, and marked information as Naxi pictographs, Chinese characters, Naxi part of speech, etc. Finally we develop a Naxi bilingual vocabulary management system and construct the Naxi Chinese bilingual electronic dictionaries, which has expanded the Naxi Chinese vocabulary greatly.(2) Construct Naxi Chinese bilingual corpus. Firstly, the computerized information processing technology is used to collect, standardize and store large amounts of Naxi corpus, and then more than6000of the Naxi vocabularies are finished. Secondly, Naxi Chinese corpus alignment is realized using the Naxi Chinese bilingual corpus management system. Finally, Naxi Chinese bilingual corpus are classified and marked normally, combined syntax characteristics and syntactic semantic features of Naxi language, and then Naxi Chinese bilingual corpus are constructed, thereby about Naxi Chinese30000aligned corpus are marked.(3) Propose Naxi Chinese intelligent input solutions in detail. We introduce Naxi Chinese word lexicon storage structure and Naxi Chinese stopwatch and use N-Gram language model to analyze the corpus on the basis of the Naxi Chinese bilingual corpus library; finally we implement the Naxi Chinese intelligent input system on the basis of the windows IME input method technology and N-Gram language model.
Keywords/Search Tags:Naxi pictographs, Naxi-Chinese electronic dictionary and bilingual corpus, N-Gram model, Naxi-Chinese intelligent input
PDF Full Text Request
Related items