Research On Statistical Chinese Word Sense Disambiguation

Posted on:2007-01-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y N Yang

Full Text:PDF

GTID:2178360185985660

Subject:Computer Science and Technology

Abstract/Summary:

Word Sense Disambiguation (WSD) is an important research project in computalitonal linguistics and natural language processing (NLP), and is also one of hot spot research problems in NLP in recent years. The main emphasis of our research is statistical Word Sense Disambiguation, which can be classified into two categories according different discipline methods: supervised and unsupervised. The early studies of WSD were to rely mainly on based knowledge and supervised machine learning methods, with the improvement of computing and storage technology, unsupervised methods have been paid more and more regard.The research in this article consists of three areas:1. Introducing resource's construction. It includes two areas: the construction of IR-Lab Classifying Dictionary and corpus'building. The construction of IR-Lab Classifying Dictionary will provide a great help for corpus'label and Equivalent pseudowords'construction.2. Investigating how to model the WSD. The Na?ve Bayes model, Maximum Entropy, Support Vector Machine and Decision trees model are examined in Chinese WSD. By the comparative study of some models, The Naive Bayes model and Maximum Entropy are better than other models in performance, especially the Na?ve Bayes model, it is convenient over construction and realization, and machine learning process is brief and efficient.3. Introducing the concept of Equivalent pseudowords and the method of its construction, and achieving unsupervised WSD method by them. We try the unsupervised WSD method based on Equivalent pseudowords by the Na?ve Bayes model and Maximum Entropy in paper. It gets 81% correct rate on the test data of Senseval-3, which is obvious better than supervised method accordingly. The experiment introduces that the concept of Equivalent pseudowords and unsupervised WSD technology based on Equivalent pseudowords provide a new thought and method for exploring the new technology of WSD.In brief, the article has done some useful attempts in machine learning and unsupervised WSD methods, and gets some initial findings. With devotion of...

Keywords/Search Tags:

Statistical learning model, Machine learning, Word Sense, Disambiguation, Equivalent pseudoword

Related items

1	Research On Key Technologies Of Word Sense Disambiguation Based On Statistical Learning
2	Automatic Knowledge Acquisition For Word Sense Disambiguation
3	The Application Research Of Word Sense Disambiguation In The Statistical Machine Translation
4	Research On Statistical Method Of Chinese Word Meaning Disambiguation Based On Multi - Classifier
5	Chinese Word Sense Disambiguation Based On Semantic
6	Chinese Word Sense Disambiguation Based On Parsing Tree
7	Research On Chinese Word Sense Disambiguation Method Based On Deep Learning
8	Research On Word Sense Disambiguation Based On Semi-supervised Model
9	Research On Word Sense Disambiguation Based On Deep Learning
10	Towards high-performance word sense disambiguation by combining rich linguistic knowledge and machine learning approaches