Font Size: a A A

Research Of Phrase-based Translation Model Using Syntactic And Morphologic Information

Posted on:2010-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:K LuoFull Text:PDF
GTID:2178360302459928Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Statistical Machine Translation (SMT) is one of the hotspots of Natural Language Processing (NLP) research. In this thesis, I will introduce my research on translation model of phrase-based SMT using syntactic and morphologic information. Syntactic analysis is one of the key issues of NLP, but also the groundwork for the translation model. We first construct a dependency syntactic parser, acquire syntactic information from the parser, describe acquisition techniques of syntactic and morphologic information, then construct a Chinese-to-Mongolian translation model. Based on this idea, a Chinese-to- Mongolian translation system is implemented. The results of this thesis can be summarized as following:1.Design of syntactic parserSyntactic analysis is one of the basis tasks of NLP, the keys of which are how to select features and reducing the search time. In this thesis, the spanning tree algorithm combining probability valency pattern theory is proposed to construct a dependency parser. Also, the Margin Infused Relaxed Algorithm (MIRA) is introduced as training algorithm of this parser. The experimental result shows the improvement of accuracy.2.Acquisition techniques of syntactic and morphologic informationCurrently, phrase-based SMT is still the mainstream, but this model can not deal well with linguistic information (syntax, semantics, morphology, etc.). To solve this problem, we first enrich the syntactic information gotten from the dependency parser to each word or phrase in source. On the other hand, morphologic information (stem, affix, etc.) is acquired in target. Giving this method, and based on this idea, the three levels (words, phrases, sentence.) of information are integrated to improve the quality of the translation.3.Construction method of translation modelBecause of the addition of Syntatic and morphologic information, the number of Factors is rapidly increased, which will result in data-sparse and over-fitting in original model. We introduce Logarithmic Option Pool (LOP) to construct a translation model named LOP-Factored model. Parameters are adjusted to find the balance during all factors (words, part of speech, syntax, morphology, etc.). By using this method we proposed, the BLEU score has been significant increased and the above problems of phrase-based SMT system are solved in a certain extent.
Keywords/Search Tags:Dependency Syntactic Analysisis, Chinese-to-Mongolian Translation Model, Factored Translation Model, Statistical Machine Translation
PDF Full Text Request
Related items