Research And Implementation Of Semantic Similarity Computing By Combining Knowledge-based And Corpus-based Methods

Posted on:2017-05-22

Degree:Master

Type:Thesis

Country:China

Candidate:K L Shi

Full Text:PDF

GTID:2308330482479331

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

As playing a significant role in Natural Language Process, semantic similarity calculation is widely applied in word sense disambiguation, machine translation, spelling correction, text categorization, question answering and many other text-mining areas. Currently semantic similarity calculation algorithm has two major research directions:one is based on the cosine distance between word vectors. The other is based on the knowledge base like WordNet, by extracting the relationship and path distance to calculate the similarity between two nodes. Distributed vector models are constructed for extracting the latent semantic information from large corpora by neural networks. However, it is hard to calculate the intrinsic relationship between words due to the lack of relations of semantic information; for knowledge-based methods, as the limited amount of terms in the semantic library, it is not suitable for calculating the similarity between large texts. This paper analyzes the advantages and disadvantages of the two methods and presents a method for measuring the semantic similarity using knowledge-based and corpus-based measures. Combining the knowledge-based and the corpus-based methods to compute the semantic similarity is the innovation point of this paper, and the methods achieves better results in word and short text similarity computing.For word similarity calculation, the authors primarily leverage the corpora to train distributed word vector via CBOW model, WordNet is used as the additional knowledge with unambiguous semantics to augment the semantic information of corpora and conduct multiple prototype vectors per word in low-dimensional vector space, and then the similarity of words is calculated by Max-Sim Model. The experiments are carried on three common datasets include RG-65, MC-30 and WS-353.For short text similarity calculation, similarity matrix method is used to calculate the similarity. We describe knowledge-based similarity feature, corpus-based similarity feature and their combination similarity feature of word semantic similarity, and show how they can be used to derive a text-to-text similarity metric. The experimental results on Microsoft Research Paraphrase Corpus show the method is feasible and it achieves higher precision, recall and F1 value compared with the similar models.A web services matchmaking method is proposed based on semantic similarity calculation. Input and output interfaces are calculated by the word similarity method, text description is calculated by the sort text similarity. Three similarity values are then considered to describe the web service similarity. The proposed method is proved to be feasible and effective by a comparative experiment on OWLS-TC dataset.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Semantic Similarity Between Words And Between Short Texts Based On WordNet
2	Conceptual Semantic Similarity Calculation Based On WordNet And Its Application Research
3	An Algorithm On Web Services Matchmaking Based On Ontology And Its Word Similarity
4	Research On Semantic Similarity Calculation Of Chinese Short Text
5	The Research Of Semantic Similarity Between Short Text Based On WordNet
6	The Study Of Measures And Applications Of Short Text Semantic Similarity
7	Research And Application Of Wordnet-Based Semantic Similarity Measurement
8	Research Of English Sentence Similarity Measure Based On Wordnet
9	Clustering Algorithm Research Of Short Text Based On Semantic Similarity
10	Chinese-Old Bilingual Text And Sentence Similarity Calculation Research