Font Size: a A A

Recognition And Translation Of Russian Base Noun Phrase

Posted on:2016-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2298330467473092Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
BaseNP is a relatively simple structure of syntax elements, which contains relativelycomplete semantic information. BaseNP is widely used and plays an important role insentence. Realization and translation of BaseNP automatically will help to understanddifferent languages. Recognition and translation of Russian BaseNP is significant andvaluable for cross-language retrieval, machine translation and other applications.This paper learnt and summarized language and grammatical features of Russian, andrealized the recognition of Russian Base Noun Phrases based on the idea of combining rulesand statistical method. Besides, a method of building training corpus for CRF automatically isproposed to save the cost of annotating corpus manually. On the other hand, a translationmethod explicitly used Russian language feature, which is implicit in Russian words withform changes, improved translation quality of Russian translation BaseNP. The main worksinclude the following aspects:Firstly, realized the recognition of Russian Base Noun Phrases based on the idea ofcombining rules and statistics, and for the lack and annotation cost of Russian corpus,proposed a method to build corpus automatically, which is based on Russian-Chinesedictionary and pattern library of speech collocation. The method builds training corpus forCRF automatically and gets the model to recognize BaseNP after labeling based on patternlibrary of speech collocation following the principle of Maximum Matching.Secondly, a method of Russian BaseNP translation based on implicit knowledge isproposed. The “implicit knowledge” is Russian language feature which is reflected as wordform changes. Such as speech, case, gender, number, etc. By explicitly representing Russianlanguage feature in the corpus, data sparseness can be solved to a large extent, moreover,word alignment results can be improved in a certain degree, and the translation quality ofRussian BaseNP can be enhanced ultimately.The F value of Russian BaseNP recognition method of this article is84.14%, which is on the basis of corpus annotation cost saving. The BLEU of translation by using of languagefeatures is0.4257, increased by about10percentage points higher than the traditionalphrase-based machine translation method.
Keywords/Search Tags:Russian, BaseNP, recognition, translation, rule, statistic
PDF Full Text Request
Related items