Font Size: a A A

Research On Automatic Word Sense Tagging In Chinese-English Parallel Corpus

Posted on:2008-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2178360245997835Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Automatic semantic tagging corpus construction is one of the best strategies to reduce knowledge acquisitions bottleneck for supervised Word Sense Disambiguation (WSD). The existing techniques on automatic semantic tagging still have many shortcomings, bilingual parallel corpus has brought new prospects to this subject. A large-scale Chinese-English bilingual parallel corpus is used in this thesis, integrating the word alignment, semantic similarity and other techniques, to research semantic tagging methods of obtaining a Chinese and English semantic tagging corpus which can solve the training corpus shortage problem for superwised word sense disambiguation. Specifically, the following researches are done in this thesis:Firstly, a single-language disambiguation algorithm based on the target language set is improved and implemented. Specifically, this method is on the basis of word alignment from two perspectives: Chinese-English and English-Chinese, the source word's target translation set is collected. After that, English and Chinese similarity is used on the target set to label English or Chinese sense.Secondly, an integrated approach based on bilingual semantic dictionary for Chinese-English semantic tagging is researched. This method introduces HowNet bilingual dictionaries semantic properties and WordNet semantic resources just from the perspective of Chinese-English with WordNet semantic similarity tool to label two sides. This method can effectively enhance the English and Chinese precision Tagging.Finally, we combined HowNet bilingual characteristics and large-scale statistical data. Experiments show that this way can substantially increase the coverage of semantic tagging with no significant reduction in the rate of accuracy. And during the process, we objectively integrate the two resources: HowNet and WordNet, effectively expand the English translations to HowNet DEF.The semantic tagging results provides convenience for cross-language research and have wider application value in natural language processing fields, such as translation selection, machine translation, cross-language information retrieval, etc.
Keywords/Search Tags:sense labeling, HowNet, WordNet, semantic similarity
PDF Full Text Request
Related items