Font Size: a A A

Research On Named Entity Equivalents Automatic Acquisition Method Based On English-Chinese Parallel Corpus

Posted on:2015-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2298330434950196Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named Entities (NEs) were defined as proper names and quantities of interest, mainly including person, organization, and location names. While bilingual NEs are NE pairs from two different languages with translation relations. At the present stage of globalization, communication between different languages and countries increasingly highlights its importance and necessity. In this social background, Natural Language Processing (NLP), such as Machine Translation technology, obtains rapid development. As one of the research hotspots, NE recognition and translation are widely used in various tasks of NLP, such as Machine Translation, Information Retrieval, Question Answering system, Text Classification and Automatic Abstraction.This paper mainly studies the automatic acquisition method of NEs in English-Chinese bilingual parallel corpus. And by the study of previous work, this paper presents an approach to extract English-Chinese NE equivalents based on multiple features, such as transliteration model and translation model feature. We first recognize Chinese and English Named Entities respectively from the English-Chinese parallel corpus, thus forming multiple NE candidate equivalents, then calculate the values of features between every candidate equivalents. At last we use the NE equivalents alignment model to get the final English-Chinese NE equivalents, and the Maximum Entropy (ME) model is implemented for the alignment task. The experimental results suggest that the method proposed in this paper can effectively improve the precision and recall of NE equivalences extraction in parallel corpus.The contributions of this paper are as follows:(1) an automatic acquisition method of NE equivalents from parallel corpus is presented;(2) the features between NE equivalents are effectively used, including transliteration model, translation model, co-occurrence frequency and word length feature;(3) based on the above features, the ME model is used for the alignment of named entity equivalences.
Keywords/Search Tags:Named Entity Equivalents, Transliteration Model, Translation Model, Parallel Corpus
PDF Full Text Request
Related items