Font Size: a A A

Research On Semantic Similarity Between Words And Between Short Texts Based On WordNet

Posted on:2012-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:K Y ZhangFull Text:PDF
GTID:2178330332499568Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In many research fields such as Psychology, Linguistics, Cognitive Science and Artificial Intelligence, computing semantic similarity is an important issue and has theoretical research value and application prospect. According to an effective semantic similarity method, system performance in these fields can be improved very much. Based on this opinion, in this paper, we give an Information Content based on Extending Relations (ICER), a Word Semantic Similarity based on Path and Information Content (SimP&IC), and Short Text Semantic Similarity based on Maximum (STSSMax).1. Information Content based on Extending RelationsInformation Content plays an important part in Word Semantic Similarity methods. At present, there are two methods for computing Information Content. One is based on big Corpus and WordNet hiberarchy, and another only depends on WordNet hiberarchy which is proposed by Nuno. According to Nuno and Pirro, the last method is better. In the process of computing Information Content, Nuno only cosiders Hypernym/Hyponym relations and cosiders no other relations. But Meronym/Holonym relations also reflect semantic relations in WordNet. Based on this opinion, we propose the Information Content based on Extending Relations in the paper.2. Word Semantic Similarity based on Path and Information ContendWord Semantic Similarity plays an important part in Short Text Semantic Similarity methods. There are lots of methods computing Word Semantic Similarity, but many of them only consider single factor, e.g. Path. Path and Information Content have different effects on Word Semantic Similarity and the results of Word Semantic Similarity should be improved if we consider all the two factors. Based on this opinion, we propose the Word Semantic Similarity based on Path and Information Content in the paper.3. Short Text Semantic Similarity based on MaximumThere are many text similarity methods, but many of them are useless to compute Short Text Semantic Similarity. When compute Word Semantic Similarity, we always select the maximum semantic similarity of the concepts containing the words. So, when compute Short Text Semantic Similarity, we use the maximum similarity between words. Based on this opinion, we propose Short Text Semantic Similarity based on maximum.At the same time, we verify the three methods according to experiment. Using the RG, PS1 and PS2 data sets, we find ICER and SimP&IC are better than other methods. The same result is got using Li data set. According to Li data set, we find STSSMax is efficacious. The results reveal the combination of ICER,SimP&IC and STSSMax is best when compute the Short Text Semantic Similarity. The result is much better than other methods.
Keywords/Search Tags:Information Content, Word Semantic Similarity, Short Text Semantic Similarity, WordNet
PDF Full Text Request
Related items