Font Size: a A A

Research On Statistical Method Of Chinese Word Meaning Disambiguation Based On Multi - Classifier

Posted on:2012-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:J GuoFull Text:PDF
GTID:2208330362966050Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Word Sense Disambiguation(WSD) is an basic and important subject incomputational linguistics and natural language processing(NLP),and also is one of hottestresearch problems in NLP in recent years. It means a lot to many fields such as MachineTranslation, Information Retrieval, thematic analysis, Text Categorization, etc. The mainemphasis of our research is statistics Word Sense Disambiguation. The research carry outby three aspects based on a large-scale People’s Daily Corpus, which is developed byICL/PKU.1. Study characteristic and feature extraction method of《people’s Daily》Word-SenseTagging Corpus. Firstly, analyzed structure and available disambiguation knowledge of <<people’s Daily>> Word-Sense Tagging Corpus. Then, carried some study on thecharacteristic of dominant information, semi-explicit information and hidden informationin context of polysemous words and their extraction method and using method.2. Analyze and compare some WSD model and then discusses their used feature set ofcharacteristic and complementarities between them, which is supply useful information forintegrate classifiers. Firstly, studied Naive Bayes model(NB), Decision Tree model(DT),Vector Space model(VS), Maximum Entropy model(ME) etc. modeling methods andfeature set of acquisition methods. And analyzed the complementary of these models inWSD.3. Propose a classifier integration method. After analyze some classifier, we putforward a auto weight adjust voting method that draw on the experience of identificationfield integrated classifier idea to construct integrated classifier. Experimental results showthat the proposed classifier WSD accuracy reach to91.86%.4. Establish tagging experimental platform. The platform is an application framework,integrates common module such as participle, feature extraction and results of testing.Users can add their own algorithm in this platform. So, they can eliminate repetitive tasksand focus on algorithm.In brief, the article has done some useful attempts in multiple classifiers integratedWSD methods, the research show that multiple classifiers integrated can improve the WSDaccuracy, and will push a step for WSD and provide enlightenment for NLP other subject.
Keywords/Search Tags:word sense disambiguation and tagging, machine learning, Multipleclassifier integration, feature selection, corpus
PDF Full Text Request
Related items