Font Size: a A A

Research And Application Of Chinese Word Segmentation Based On English-Chinese Parallel Corpus

Posted on:2013-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:C DaiFull Text:PDF
GTID:2248330371997491Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
English-Chinese parallel corpora contain translation data between English and Chinese, which provide strong data resources for natural language processing and English-Chinese translation. With widely used of parallel corpora, researchers increasingly realized the importance of corpora’s construction, and how to build it has become one of the important research direction in corpora discipline.Original corpus can not be applied directly when parallel corpus used, but it needs to be preprocessed. One task of the process is Chinese word segmentation, which is basis of many natural language processing. Up to now, scholars at home and abroad have been committing to the research and exploration of word segmentation technology. However, most of their methods are for monolingual corpus, while seldom for parallel corpus. In this paper, after analyzing the characters of corpus, a Chinese word segmentation system for English-Chinese parallel corpus is designed.Our work mainly contains three aspects as follows:(1) The overview of natural language processing’s development and the significance of Chinese segmentation research is presented briefly.(2) The methods of Chinese word segmentation are introduced briefly. Three models based on statistical word segmentation methods and their applications in sequence annotation are analyzed, which are Hidden Markov Model, Maximum Entropy Markov Model and Conditional Random Field Model.(3) A Chinese word segmentation system for English-Chinese parallel corpus is designed by analyzing the characters of corpus. The effectiveness of our system is proved by experiments.
Keywords/Search Tags:Natural Language Processing, Parallel Corpus, Bilingual Dictionary, WordAlignment, Conditional Random Field Model
PDF Full Text Request
Related items