Font Size: a A A

Based On Semi-supervised Method Of Chinese Word Sense Disambiguation

Posted on:2019-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z F XuFull Text:PDF
GTID:2428330542472987Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Word sense disambiguation(WSD)is one of important research issues in field of natural language processing.The purpose of word sense disambiguation is to determine the meaning of ambiguous words in daily communication and conversation.Semantic information and part of speech information are two kinds of important linguistic knowledge,which help to determine semantic categories of ambiguous words.With the rapid development of the field of natural language processing,the resolution of word sense disambiguation has become a difficult problem in the field of natural language pocessing.In this paper,a semi-supervised word sense disambiguation method is proposed.Ambiguous word is viewed as center,a word sense disambiguation model is constructed by extracting the disambiguation features from the left and right adjacent lexical units,and the semi supervised method is used to optimize the word sense disambiguation model,which improves the performance of word sense disambiguation classifier.There are three aspects of research work in this paper.Firstly,the research background and significance of word sense disambiguation are expounded,and the research status of word sense disambiguation technology at home and abroad is introduced.The current research status is analyzed,and the difficult problems of word sense disambiguation technology are analyzed and explained.Secondly,this paper introduce dictionary resources and corpus used in the experiment,and expounds the content organization structure of the Tong Yi Ci Ci Lin.The background and content of training corpus and test corpus are explained,and the preprocessing is introduced.The extraction process of disambiguation feature is explained in detail.Thirdly,words,parts of speech and translations are used as discriminative features,which are extracted from left and right word units around an ambiguous word.Word sense classifier based on bayesian model is constructed.Words and parts of speech are used as discriminative features,which are gotten from left and right word units around an ambiguous word.Word sense classifier based on maximum entropy is constructed.Co-training algorithm is used based on a large number of unannotated data to optimize WSD model.Training data in Sem Eval-2007: Task#5 and a large number of unannotated corpus from Harbin Institute of Technology is applied to optimize bayesian classifier and maximum entropy classifier.At the same time,the optimized WSD model is tested.Experimental results show that accuracy of WSD model is improved after the semi-supervised method proposed in this paper is applied.
Keywords/Search Tags:word sense disambiguation, natural language processing, semantic category, discriminative feature
PDF Full Text Request
Related items