Font Size: a A A

Research On Extraction Of Bilingual Multi-word Term Translation Pairs From Comparable Corpora

Posted on:2014-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:H F XuFull Text:PDF
GTID:2248330398450741Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The extraction of bilingual term translation pairs is an important topic in the field of natural language processing. As basic resources, bilingual term translation pairs have been widely used in machine translation, information extraction, bilingual term compilation and cross-language information retrieval and so on. The early extraction work is mainly based on parallel corpora. By comparison with parallel corpora which are mainly collected from artificial translation or regulation documents, comparable corpora have the characteristics of low cost and wide source. At present, with the increase of comparable corpus-based research, the mining word level information from the comparable corpora has become a research hot spot.This paper extracts bilingual term translation pairs from the comparable corpora. For the details of the approach, Chinese and English multi-word terms (MWTs) are firstly extracted from comparable corpora respectively. Then, the possible term translation pairs are obtained by the calculation of multi-feature score. At last, a threshold is used to ensure the correctness of a term pair. During calculating feature score, this paper uses discriminative learning algorithm to adjust the weight of each feature.This paper designs and implements the MWT translation pair extraction system. The input of the system is Chinese-English corpora and the output is the extracted Chinese-English MWT pair set. This system mainly consists of three modules:(1) Chinese and English MWT extraction;(2) MWT translation pair extraction with multi-feature;(3) Multi-feature fusion and term translation pair filter with a threshold.Contribution of this work can be summarized as follows:(1) In the process of Multi-word term extraction, this paper improves the existing algorithm and develops a number rules according to the linguistic knowledge to improve extracting effect.(2) multi-feature linear fusion model is introduced to the matching of multi-word term translation pair, and the weight of each feature is adjusted with MSR algorithm. Finally experiments verify the effectiveness of the method proposed in this paper.
Keywords/Search Tags:Comparable corpora, Multi-word Term, Machine Translation, Mutli-feature Fusion
PDF Full Text Request
Related items