Font Size: a A A

Research On Extraction Of Named Entity Translation Equivalents From Comparable Corpus

Posted on:2010-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:S LinFull Text:PDF
GTID:2178360302960349Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Large scale named entity translation equivalents can improve a lot the performance of the system like machine translation or cross language information retrieval. So many methods aimed to mine named entity translation equivalents have been proposed. The previous methods mainly used the parallel corpus as the resource; but these methods are limited on scale and diversity, and also can not handle the out of vocabulary (OOV) problem well. Compared with parallel corpus, comparable corpus is more abundant, up-to-date and accessible. Therefore, how to mine bilingual named entities from comparable corpus has been paid more attention on research.This thesis presents a multi-feature based method to extract bilingual Named Entities from comparable corpus. For the details of the approach, we first recognize the Chinese and English Named Entities respectively from the Chinese-English comparable corpus, then obtain the possible pairs of Chinese and English Named Entities through calculating multi-feature score. At last we use Binary classification model, combining multi-feature to determine the possible Named Entities pairs are correct named entities translingual equivalents. During calculating feature score, we use discriminative training method to fuse multi-feature. Finally get a higher accuracy named entity translation equivalents set.This thesis designs and implements the named entity translation equivalents mining system. The input of the system is Chinese-English comparable corpus and the output is the extracted Chinese-English named entity translation equivalents set. The system is composed of 4 modules: (1) named entity extraction; (2) named entity translation equivalent pair extraction with multi-feature; (3) multi-feature fusion; (4) named entity translation equivalent pairs alignment.Contribution of this study can be summarized as follows: (1) an integration mining scheme is presented to discover, extract and verify the named entity translation equivalents with high quality from Chinese-English comparable corpus; (2) a combination of previous methods has been made, and the results of the experiments show that our scheme gains higher extraction performance than pervious approaches.
Keywords/Search Tags:Comparable corpus, Machine translation, Named entity, Multi-feature fusion, Support Vector Machine
PDF Full Text Request
Related items