Font Size: a A A

A Comprehensive Method Of English-Chinese Machine Translation

Posted on:2014-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q YangFull Text:PDF
GTID:2248330398957861Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of global integration and the wide applications of theinternet, the communication barrier for people who use different languages becomesmore and more prominent. To solve this difficult problem, people began to researchmachine translation. The research of machine translation is multi-disciplinary andcomprehensive, which contains linguistics, computer science and cognitive science,and it’s one of the fierce competitions in the international high-tech research field, it’salso a practical subject of information processing.In the whole process of machine translation, the demands of different courses arealso distinct. The source language requires to be correct of syntax analysis (reliable),the choice of polysemous words requires to be accurate in translation part, theidiomatic structure to be reasonable, and as the translation results, the Chinesesentences require to be smooth (elegant). However, there are many different kinds ofmachine translation systems, and the translation principles and applicativeenvironments are also distinct for different types of machine translation. Ininformation retrieval and automatic question and answering fields, machinetranslation needs to be rapid and extendable, or the translation sentences whichcontain synonyms need to eliminate ambiguities and promit accurate, so one-foldtranslation method cannot meet the demand people expect.Firstly, the background of topic selecting and significance, domestic andforeign development status of machine translation and the development trend ofmachine translation are introduced in the introduction in this paper. Secondly, theprinciples of rule-based and corpus-based machine translation are introduced and themerits and demerits of the two kinds of machine translation are analysised. Then, weresearched the WordNet and TFIDF method(a statistical method). The basic usage of the WordNet thesaurus is introduced, and we studied two kinds of similaritycomputation about word based on Word Net, one is word similarity computing ofChinese-English, the other is a kind of similarity computing between English words.And the calculation of the TFIDF is briefly decrypted. Then the features that WordNetcan query synsets are combined with the statistical calculating method--TFIDFmethod and then the combined method is applied to the example based machinetranslation system. For sentences that contain synonyms, we use WordNet2.1synonyms of keywords in the query sentence disambiguation before sentencesimilarity calculating, then we reuse the TFIDF method to calculate English sentencesimilarity. The copulating results of simply using TFIDF methods and the methodabove are compared, it turns that the new method is more effective. Finally, thesystem model is constructed. A virtual machine is installed under Windows XP, usingubuntu9.10version, and the build of statistics system model lays foundation forexample-based machine translation.
Keywords/Search Tags:WordNet, TFIDF, comparison, System model
PDF Full Text Request
Related items