Font Size: a A A

Research On Statistical Chinese Word Sense Disambiguation

Posted on:2007-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y N YangFull Text:PDF
GTID:2178360185985660Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Word Sense Disambiguation (WSD) is an important research project in computalitonal linguistics and natural language processing (NLP), and is also one of hot spot research problems in NLP in recent years. The main emphasis of our research is statistical Word Sense Disambiguation, which can be classified into two categories according different discipline methods: supervised and unsupervised. The early studies of WSD were to rely mainly on based knowledge and supervised machine learning methods, with the improvement of computing and storage technology, unsupervised methods have been paid more and more regard.The research in this article consists of three areas:1. Introducing resource's construction. It includes two areas: the construction of IR-Lab Classifying Dictionary and corpus'building. The construction of IR-Lab Classifying Dictionary will provide a great help for corpus'label and Equivalent pseudowords'construction.2. Investigating how to model the WSD. The Na?ve Bayes model, Maximum Entropy, Support Vector Machine and Decision trees model are examined in Chinese WSD. By the comparative study of some models, The Naive Bayes model and Maximum Entropy are better than other models in performance, especially the Na?ve Bayes model, it is convenient over construction and realization, and machine learning process is brief and efficient.3. Introducing the concept of Equivalent pseudowords and the method of its construction, and achieving unsupervised WSD method by them. We try the unsupervised WSD method based on Equivalent pseudowords by the Na?ve Bayes model and Maximum Entropy in paper. It gets 81% correct rate on the test data of Senseval-3, which is obvious better than supervised method accordingly. The experiment introduces that the concept of Equivalent pseudowords and unsupervised WSD technology based on Equivalent pseudowords provide a new thought and method for exploring the new technology of WSD.In brief, the article has done some useful attempts in machine learning and unsupervised WSD methods, and gets some initial findings. With devotion of...
Keywords/Search Tags:Statistical learning model, Machine learning, Word Sense, Disambiguation, Equivalent pseudoword
PDF Full Text Request
Related items