Font Size: a A A

Statistical Machine Translation For Specific Areas Of Research And Application

Posted on:2012-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2218330368980907Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Machine translation is a difficult and hot area in natural language understanding, machine translation is very important to multilingual communication in today's increasingly frequent international exchanges. However, the accuracy of machine translation is not ideal currently. But in specific areas, in particular, some technical document with many professional terms, the vocabulary is more fixed and the syntax is more simple, such as weather, knowledge base and other areas, and these lead to achieve good results more easier. This paper do a series of research and study in the restricted domain field based with statistical machine translation and take the Medicine field as the specific study object, and the following are the main achieved results:Translate method with fusion field rule template based on statistical machine translation.The field of rule templates and the related resources in field areas such as parallel corpus is the import basis and import means to improve the effect of the machine translation systems in strict domain field. This paper take the Medicine field as the specific study object, construct rule base in areas and areas of resource for the statistical machine translation system in the medical field, which including the areas of parallel corpus, the field of rule templates. Proposed method for the expansion of the field rule templates and template matching algorithm. And integrate the matching algorithm and the proposed areas of resources into the open domain statistical machine translation system, and to achieve for the area of statistical machine translation system. Experiments show that the field of statistical machine translation get a relatively large increase effectiveness based on a size of the field in a certain field of parallel corpus and the support of the rule template.Construct Domain-oriented language model. Established language model based on the relationship of dependency syntax combined with the medical field, training the parameters of the proposed model, and add the model into the decoding stage of statistical machine translation, take further bound of the NBEST candidate translation generated by the decoding system, recalculated score, adjustment the NBEST candidate translation sequence, and to get a better translation of the best to enhance the accuracy of translation. The final results show that the proposed language model based on the relationship of dependency syntax combined with the medical field can improve the best translation accuracy of Chinese-English statistical machine translation in medical area in some extent.Using the above results, develop a statistical machine translation prototype system combined with medical field.
Keywords/Search Tags:Statistical machine translation, Medicine area, Areas of rule templates and resources, Dependency language model
PDF Full Text Request
Related items