Font Size: a A A

The Unknown Words From English-chinese Two-way Translation Methods

Posted on:2013-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y X SuFull Text:PDF
GTID:2248330395950175Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Out-of-Vocabulary term translation has been one of the focus and the difficulties in machine translation and cross-language information retrieval. With the rapid development of information technology and network, endless stream of new words and terms appear on the Internet, and the existing dictionary can’t go on with the translation of these words. Due to the appearance of new Out-of-Vocabulary terms, the accuracy of the OOV term translation has been affected by scarcity of the corpus in traditional OOV term translation methods. The translation of OOV terms is facing new challenges. Therefore, how to correct the translation of OOV terms has been an important research problem in natural language processing.OOV term translation has been the focus of the study of natural language processing. The traditional translation methods are mainly based on the transliteration and corpus, which are faced with a major problem:the scarcity of the corpus. Now, As network information resources has become increasingly diverse, the researchers have gradually raised the translation method based on network resources, this approach lies in how to fast access translation corpus, accurately locate the translation candidates and assess the translation candidates. However, existing methods faced two major problems, that is, the feature representation of the translation candidates is not comprehensive, and the assessment methods are too simple. In this paper, after analysis of traditional OOV term translation methods and the methods based on network resources, a converged method based on web mining, multi-feature representation and supervised learning is proposed to solve English-Chinese bi-directional OOV term translation.According to the proposed method, the entire English-Chinese bi-directional OOV term translation system can be divided into three parts:the extraction algorithm of translation candidates, multi-feature representation of the translation candidates as well as the evaluation methods of translation candidates.1. In the extraction of translation candidates, due to the differences in Chinese and English language, the method is divided into the Chinese translation candidate extraction and the English translation candidate extraction. The extraction method of Chinese translation candidate extraction is based on PAT-Tree. Because the segmentation of English word does not need in the English translation candidate extraction, the method uses a simple translation candidate extraction, and then combines the information entropy and heuristic rules to filter noises.2. In the representation of translation candidates, after the comprehensive analysis of the internal and context of the translation candidates, the method combines global features, local features and Boolean feature to represent translation candidates. The representation of the multi-feature is a more comprehensive method, provides a good foundation for the assessment of translation candidates.3. In the evaluation of translation candidates, we use SVM and Ranking SVM method for evaluation. SVM has been widely used to deal with the classification. However, after analysis, the assessment of translation candidates classified as a classification problem is not very appropriate, but more accurate to classify it as a sorting problem. Therefore, this paper uses SVM and Ranking SVM to evaluate translation candidates. After the experiment, we found that the accuracy of Ranking SVM was slightly higher than the SVM.Finally, the experiments based on person names, location names and organization names in English-Chinese and Chinese-English OOV term translation are executed, and obtain well translation accuracy. In addition, in order to express the generality of the proposed method, named entities of a variety of terms is translated by the proposed method, and also achieved good results.
Keywords/Search Tags:OOV Term Translation, PAT-Tree, Information Entropy, SVM, Ranking
PDF Full Text Request
Related items