Font Size: a A A

Liheci Word Sense Disambiguation Based On SVM

Posted on:2017-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z J ZhangFull Text:PDF
GTID:2348330503481198Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Liheci Word Sense Disambiguation(WSD) is very important to many research fields such as Chinese-English machine translation, information retrieval, speech recognition, text classification, etc. Based on the large modern Chinese corpus of Center of Chinese Linguistics PKU, we research the Liheci word sense disambiguation which have two meanings and part of speech of them are the same.Firstly, for different forms of Liheci, the corresponding feature templates are designed, we get the context according to the templates. Then, by considering the different types of features have different effects to the Liheci word sense disambiguation, feature weighted based on the importance of feature types is proposed in this paper. For feature weights of some type, we change the feature weights of the other two types to study the disambiguation effect of the three types of features, respectively. Finally, A word sense disambiguation method is applied to the Liheci ambiguous words and a classifier model is established using SVM. Genetic algorithm is used to optimize the penalty parameter C and the parameter ? in kernel function with the accuracy of Liheci word sense disambiguation as fitness function in establishing the classifier of SVM. The results show that the disambiguation effect with different feature weights based on the importance of feature types were better than boolean function, word frequency and TF-IDF.
Keywords/Search Tags:Liheci, Word sense disambiguation, SVM, Feature extraction, Feature weights, Genetic algorithm
PDF Full Text Request
Related items