Font Size: a A A

Research Of Chinese-English Translation Of Product Names And Classification

Posted on:2013-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2268330392467976Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Product names is a kind of specialized words, which are being used while registeredat authority. Compared to common”goods” names, product names is meant to describeitself precisely, and is usually not as short as a slogan. Meanwhile, they are not onlysimilar to named entity and terminology in natural language processing, but also havesome traits of themselves. Classification is a name of summary to products, and verysimilar to them also. Due to the lack of daily using of product names, how to processmachine translation on product names is still to be improved. This paper will describe awhole solution to do this.There are lots of good methods from researches on specialized words that could beapplied to translate product names. These methods often deal with terminology, namedentity and Out of Vocabulary(OOV). The most important ones are: extract translationpairs from bilingual parallel corpus; training some patterns to match translation pairsfrom web data; using ontology to organize special dictionary. However, there are toomany diferences between product names, and they really include some terminology andnamed entity. A single method could not translate them completely. Also, except somecommon names, the most names have their own traits on word-building. This paper isbased on researches by others, applying several methods to translate, and use some newtechnic to complete them. These works are shown as below:1. Extract web dictionary items. To translate easy words and prepare for later pro-cessing, after analysing and designing solution to extract those items, we get ahorde of meanings from two web dictionary. After that, these meanings are to bepre-processed to ensure the rank of meanings proper.2. Extract translation from anchor texts of wikipedia. On wikipedia, most entrieshave their own texts in other languages. The URLs linking them contains the infoof them in diferent languages. We extract these anchor texts by mediawiki APIand get all of them we can find.3. Extract translation from bilingual parallel sentence pairs. In this paper, over300thousand sentence pairs are got from two web data souce: youdao and jukuu.These texts is to be segmented in Chinese and Chinese-English word alignment.According to results, the corresponding phrase translations are extracted. 4. Design and implement a series of translation rules to deal with words which can-not find corresponding translation from web data source directly. These rules areapplied on automatic translation. After segmentation, basic words will be sentback to the first step to get translation. After that, the whole name will be tried tomatch every rule sequentially, which add conjunction to the result or make sometranslation inversion.The result shows that the solutions in this paper solved over80%translation of productand classification names.
Keywords/Search Tags:automatic translation, product, parallel sentence pair, rules
PDF Full Text Request
Related items