Font Size: a A A

Word Sense Disambiguation Based On Semantic And Lexical Information

Posted on:2017-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:L R SunFull Text:PDF
GTID:2348330482986566Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Vocabulary collocation of Chinese language has strong flexibility and versatility, these features lead to the generation of Chinese lexical ambiguity. The uncertainty of language directly affects realization of the related application in natural language processing. This shows that the study of word sense disambiguation is the key to solving natural language processing. The goal of word sense disambiguation is to match automatically word meaning of corresponding context by extracting linguistic knowledge from the corpus.This paper introduces the origin, significance and current research status of this topic. The challenges ahead for word sense disambiguation. The advantages and disadvantages of the common word sense disambiguation methods are compared and analyzed. The method is proposed according to the relevant knowledge and research results. It is mixed features disambiguation method based on supervised and statistical learning theory. This method combines the common used lexical information and semantic class information in linguistics as disambiguation features. Finally, the extracted features are used to train the classification model and the accuracy of the model is tested.The main content of this paper is shown as follows:Firstly, this paper introduces popular methods in the field of word sense disambiguation. It makes examples and comparisons focusing on these methods.It introduces evaluation system of the disambiguation and the accuracy calculation method. The problems in the study of word sense disambiguation urgently is to be solved.Secondly, the linguistic engineering resource of word sense disambiguation is analyzed. There are source corpus, format of the corpus, the analysis tool, the annotation system of the corpus, selection of the features and extraction of the features. The linguistics knowledge that the corpus provides is conducive to theselection of disambiguation features. The semantic information is obtained by using Tongyici Cilin. The feature extraction of semantic classes and lexical information is combined. And semantic code, morphology, part of speech and a lot of linguistic knowledge is obtained from them. The multi-level knowledge structure of semantic code can provide different guidance.Thirdly, a single feature or a combination of different features is extracted.And these different feature vectors are constructed. Supporting vector machine classifier is trained by using these different feature sets. Then, the experimental results of different feature sets are compared. The new method and the performance of the classifier are proved.
Keywords/Search Tags:word sense disambiguation, semantic category, lexical information, feature extraction, support vector machine
PDF Full Text Request
Related items