Research And Application Of Word Similarity Based On Context

Research And Application Of Word Similarity Based On Context
Posted on:2010-07-21	Degree:Master	Type:Thesis
Country:China	Candidate:L Guo	Full Text:PDF
GTID:2178360272485267	Subject:Computer application technology
Abstract/Summary:	PDF Full Text Request
The complex relationship between the natural language words needs to be dealt with quantitative analysis practically. This paper introduces two kinds of word similarity algorithm, one is semantic similarity between words, and another is relation similarity between pairs of words. Either of them is widely used in the field of natural language processing, such as information retrieval, information extraction, text classification, word sense disambiguation and machine translation based on examples.The existing semantic similarity and relation similarity are mainly divided into two types: semantic resource and statistic, the former algorithm calculates the similarity based on a manual semantic dictionary, and the latter is in a data-driven way completely, which means finding out the word occurrence information in the context from a large corpus. This paper studies the word similarity algorithm based on Hownet and many other statistical word similarity algorithms, and in order to solve the problem of the words whose kinds of sememe are different, a new similarity algorithm based on the combination of semantics with statistics is proposed. It is the first time to use the word alternation in national official tests to prove the efficiency of the algorithm, and it solves the problem of lacking the public test corpus in Chinese and obtains better effects.This paper does synonym expansion through Hownet, and reduces the number of dimensions, eliminates noises in data, in order to solve data sparse which is difficult to solved in the calculation of non-supervise and semi-supervise relation similarity. And this paper proposes a relation similarity algorithm which based on Latent Semantic Index, and it is employed to classify the semantic relations in patents, and the accuracy is increased by 6% and reaches 44% as compared with the traditional SVM classification.In order to prove the efficiency of the two kinds of word similarity algorithm proposed in this paper, a FAQ tool which used to retrieval the similar question and a entity classification system are designed to carry out the tests.
Keywords/Search Tags:	word similarity, relation similarity, Latent Semantic Index, Hownet
PDF Full Text Request