Font Size: a A A

Research On Mongolian-Chinese Evaluation Corpus For Machine Translation

Posted on:2023-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:R H HaiFull Text:PDF
GTID:2555306788994759Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
The development of Mongolian-Chinese machine translation research since the 1980 s can be divided into three stages,namely rule-based machine translation,statistics-based machine translation and neural network-based machine translation.Although the machine translation system model and methods have been updated and evolved,the accuracy of the Mongolian-Chinese machine translation system has not yet reached the ideal level.Mongolian is widely used and has a large number of people,but compared with other languages,such as English,Chinese and other languages,the corpus resources are relatively scarce.It is impractical to improve the quality of machine translation translations by rapidly expanding the corpus.Because of this,this paper analyzes the problems existing in the machine translation system by evaluating the commonly used Mongolian-Chinese machine translation system,and proposes corresponding solutions.Randomly select 200 sentence pairs from corpora in four fields including legal documents,government gazettes,news reports,and daily dialogues to test the "Darihan-Northeast Asian Language Translation" system and the "Ulun Mongolian-Chinese Intertranslation" system.According to the test results and analysis of the development direction of the NIST dataset in recent years,the field of evaluation corpus is determined as daily dialogue.The size of the original evaluation corpus collected is 11,407 sentence pairs,and then 9,702 sentence pairs of traditional Mongolian-Chinese sentence-level parallel evaluation corpora are obtained through preprocessing.According to common problems in Mongolian-Chinese machine translation,a scoring standard is proposed(4-point system,A is the highest score,D is the lowest score).Google translation system A accounted for 58%;Darihan-Northeast Asia language translation system A accounted for 71%;Orun Mongolian-Chinese translation system A accounted for 82%.There are many reasons for mistranslation,such as punctuation marks,names of people and places,unregistered words(words/phrases),plurals,English letters,case/reflexive possession of nouns,control symbols,multiple translations,aspect/state/mode of verbs,language habits,context,ambiguity,homographs,word order,typos,etc.,will lead to differences between the expression of machine translation and the original expression.Detailed analysis,classification,statistics of mistranslation types,and put forward corresponding solutions according to the problems existing in the Mongolian-Chinese machine translation system.
Keywords/Search Tags:Mongolian, machine translation system, evaluation, corpus
PDF Full Text Request
Related items