Font Size: a A A

Research Of Word Sense Disambiguation Based On Indicators With Semantic Category Extending

Posted on:2011-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:G B ChengFull Text:PDF
GTID:2178330332460050Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the computer equipment and network applications, the focus on the informalization of Chinese is growing as well. The research of the word sense disambiguation technology which relates to the text can be used to make the computer know the contents of an article. So the progress of the word sense disambiguation technology will certainly be able to promote the rapid development of the understanding of human language by computer.Recently, the research methods of the natural language technology have been developing rapidly, it appears a very prominent one which based on statistical theory and whose development depends on dictionaries and corpora. Statistical Word Sense Disambiguation is limited by the scale of corpora, the balance of word distribution, and Lacking of knowledge. These show the phenomenon of data sparseness and small probability vocabulary. Therefore, to aim at solving the key problem, that is, the breakthrough of the present word sense disambiguation, we have put forward a new method to solve it by discussing and analyzing the word sense disambiguation methods and intellectual resources. According to the theory of lexeme and word cluster, we integrate dictionary resources and network text resources so as to promote the efficiency of the word sense disambiguity greatly. Then we can solve the problem successfully, such as the frequent appearance of data sparseness and small probability vocabulary. The goal of this article is to build word sense disambiguation system. First of all, we use the indicator method for realizing the system, then we use simple information entropy algorithm to find the indicators. Second, we collect and study the vocabularies which are got from the network following the condition as the same as TongYiCiLin's context. We get word cluster which have the same semantic level as indicator. For such collections and instances from web, we make them into new resources. We can expand the indicators by using the semi-supervised method and the complementary knowledge of linguistics, that is, from lexeme to word cluster. They can enforce the indicator.Both of these methods could make such a great contribution on the research of the word sense disambiguation system that we could get a more outstanding performance with less time. In other words, what we want to do here is, by solving the problem of the frequent appearance of data sparseness and small probability vocabulary, and by calculating the semantic category extending of indicators, we can excavate the corpus knowledge deeply and improve the performance of corpus in present limited scale. The experiment proves that the semantic category extending calculation method of indicators can positively solve the problem of data sparseness and small probability vocabulary frequency, and it can also improve the macro average accuracy a lot as well.
Keywords/Search Tags:Natural Language, Word Sense Disambiguation, Indicator, Semantic Category Extending Calculation
PDF Full Text Request
Related items