Font Size: a A A

Web-english-chinese Unknown Word-based Translation Methods

Posted on:2011-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2208360305997948Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, new words and OOV (Out Of Vocabulary) terms are constantly emerging. Traditional dictionary-based methods are not powerful enough to meet the translation requirements of the Cross-Language Information Retrieval (CLIR) task. The issue has become an extremely important and challenging problem in CLIR. Recently, Web-based translation model is becoming more and more popular. Still,there are key problems demanding further analysis and resolution.This thesis focuses on English-Chinese Bi-Directinal OOV translation based on Web Mining and Supervised Learning. Term extraction algorithm by PAT-Tree data structure and translation candidates ranking algorithm by Ranking-SVM methods are proposed in this paper. The PAT-Tree extraction algorithm resolves Chinese OOV recognition problem while the ranking strategy deals with translation pair evaluation issue. Besides, a muti-feature analysis experiment is designed to study the contribution of each feature among the whole translation performance.Many evaluation experiments of OOV term translation are also presented in this paper. Authoritative data source is adopted, e.g. CoNLL-2003 corpus is used to evaluate the performance of English-Chinese algorithms and SIGHAN 2008 corpus is used to evaluate the performance of Chinese-English algorithms.Since experiment data is generally accepted, the final results are objective and comparable.In summary, this paper makes a survey in modern Web based OOV translation algorithms.Besides, it not only proposes the Ranking-SVM algorithm and applies modified PAT-Tree structure, but also adopts feature analysis in the experiment process.Researchers in related fields may use the idea as reference.
Keywords/Search Tags:OOV Term Translation, PAT-Tree, Support Vector Machine (SVM), Ranking SVM
PDF Full Text Request
Related items