Research Of Word Sense Disambiguation Based On Indicators With Semantic Category Extending

Posted on:2011-09-17

Degree:Master

Type:Thesis

Country:China

Candidate:G B Cheng

Full Text:PDF

GTID:2178330332460050

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of the computer equipment and network applications, the focus on the informalization of Chinese is growing as well. The research of the word sense disambiguation technology which relates to the text can be used to make the computer know the contents of an article. So the progress of the word sense disambiguation technology will certainly be able to promote the rapid development of the understanding of human language by computer.Recently, the research methods of the natural language technology have been developing rapidly, it appears a very prominent one which based on statistical theory and whose development depends on dictionaries and corpora. Statistical Word Sense Disambiguation is limited by the scale of corpora, the balance of word distribution, and Lacking of knowledge. These show the phenomenon of data sparseness and small probability vocabulary. Therefore, to aim at solving the key problem, that is, the breakthrough of the present word sense disambiguation, we have put forward a new method to solve it by discussing and analyzing the word sense disambiguation methods and intellectual resources. According to the theory of lexeme and word cluster, we integrate dictionary resources and network text resources so as to promote the efficiency of the word sense disambiguity greatly. Then we can solve the problem successfully, such as the frequent appearance of data sparseness and small probability vocabulary. The goal of this article is to build word sense disambiguation system. First of all, we use the indicator method for realizing the system, then we use simple information entropy algorithm to find the indicators. Second, we collect and study the vocabularies which are got from the network following the condition as the same as TongYiCiLin's context. We get word cluster which have the same semantic level as indicator. For such collections and instances from web, we make them into new resources. We can expand the indicators by using the semi-supervised method and the complementary knowledge of linguistics, that is, from lexeme to word cluster. They can enforce the indicator.Both of these methods could make such a great contribution on the research of the word sense disambiguation system that we could get a more outstanding performance with less time. In other words, what we want to do here is, by solving the problem of the frequent appearance of data sparseness and small probability vocabulary, and by calculating the semantic category extending of indicators, we can excavate the corpus knowledge deeply and improve the performance of corpus in present limited scale. The experiment proves that the semantic category extending calculation method of indicators can positively solve the problem of data sparseness and small probability vocabulary frequency, and it can also improve the macro average accuracy a lot as well.

Keywords/Search Tags:

Natural Language, Word Sense Disambiguation, Indicator, Semantic Category Extending Calculation

PDF Full Text Request

Related items

1	Research Of Word Sense Disambiguation Based On Word-sense Category Extending
2	Based On Semi-supervised Method Of Chinese Word Sense Disambiguation
3	Research On Query Expansion & Key Technologies Based On Semantic Analysis
4	Research And Application Of Word Sense Disambiguation Method Based On Contextual Semantic
5	Research On Chinese Word Sense Disambiguation Based On Semantic Analysis
6	Chinese Word Sense Disambiguation Based On Semantic
7	Word Sense Disambiguation Corpus Automatic Acquisition
8	A Study Of Chinese Word Sense Disambiguation Based On Hownet
9	Chinese Word Sense Disambiguation Based On Parsing Tree
10	Research On Word Sense Disambiguation Based On The Strategy Of Field Priority Selection