Research On Semantic Similarity Between Words And Between Short Texts Based On WordNet

Posted on:2012-04-11

Degree:Master

Type:Thesis

Country:China

Candidate:K Y Zhang

Full Text:PDF

GTID:2178330332499568

Subject:Computer application technology

Abstract/Summary:

In many research fields such as Psychology, Linguistics, Cognitive Science and Artificial Intelligence, computing semantic similarity is an important issue and has theoretical research value and application prospect. According to an effective semantic similarity method, system performance in these fields can be improved very much. Based on this opinion, in this paper, we give an Information Content based on Extending Relations (ICER), a Word Semantic Similarity based on Path and Information Content (SimP&IC), and Short Text Semantic Similarity based on Maximum (STSSMax).1. Information Content based on Extending RelationsInformation Content plays an important part in Word Semantic Similarity methods. At present, there are two methods for computing Information Content. One is based on big Corpus and WordNet hiberarchy, and another only depends on WordNet hiberarchy which is proposed by Nuno. According to Nuno and Pirro, the last method is better. In the process of computing Information Content, Nuno only cosiders Hypernym/Hyponym relations and cosiders no other relations. But Meronym/Holonym relations also reflect semantic relations in WordNet. Based on this opinion, we propose the Information Content based on Extending Relations in the paper.2. Word Semantic Similarity based on Path and Information ContendWord Semantic Similarity plays an important part in Short Text Semantic Similarity methods. There are lots of methods computing Word Semantic Similarity, but many of them only consider single factor, e.g. Path. Path and Information Content have different effects on Word Semantic Similarity and the results of Word Semantic Similarity should be improved if we consider all the two factors. Based on this opinion, we propose the Word Semantic Similarity based on Path and Information Content in the paper.3. Short Text Semantic Similarity based on MaximumThere are many text similarity methods, but many of them are useless to compute Short Text Semantic Similarity. When compute Word Semantic Similarity, we always select the maximum semantic similarity of the concepts containing the words. So, when compute Short Text Semantic Similarity, we use the maximum similarity between words. Based on this opinion, we propose Short Text Semantic Similarity based on maximum.At the same time, we verify the three methods according to experiment. Using the RG, PS1 and PS2 data sets, we find ICER and SimP&IC are better than other methods. The same result is got using Li data set. According to Li data set, we find STSSMax is efficacious. The results reveal the combination of ICER,SimP&IC and STSSMax is best when compute the Short Text Semantic Similarity. The result is much better than other methods.

Keywords/Search Tags:

Information Content, Word Semantic Similarity, Short Text Semantic Similarity, WordNet

Related items

1	Research And Implementation Of Semantic Similarity Computing By Combining Knowledge-based And Corpus-based Methods
2	The Research Of Semantic Similarity Between Short Text Based On WordNet
3	Research And Application Of Wordnet-Based Semantic Similarity Measurement
4	Conceptual Semantic Similarity Calculation Based On WordNet And Its Application Research
5	Research On Algorithm Of Semantic Net Mining Of Short Texts Based On Wordnet
6	The Study Of Measures And Applications Of Short Text Semantic Similarity
7	Research On Method Of Semantic Similarity Based On Information Content
8	Research Of Multi-Documents Summarization Based On Information Extraction And Semantic Similarity
9	Clustering Algorithm Research Of Short Text Based On Semantic Similarity
10	Word Similarity Measurement Based On Word Embedding And WordNet