Font Size: a A A

Research And Implementation Of Semantic Similarity Computing By Combining Knowledge-based And Corpus-based Methods

Posted on:2017-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:K L ShiFull Text:PDF
GTID:2308330482479331Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As playing a significant role in Natural Language Process, semantic similarity calculation is widely applied in word sense disambiguation, machine translation, spelling correction, text categorization, question answering and many other text-mining areas. Currently semantic similarity calculation algorithm has two major research directions:one is based on the cosine distance between word vectors. The other is based on the knowledge base like WordNet, by extracting the relationship and path distance to calculate the similarity between two nodes. Distributed vector models are constructed for extracting the latent semantic information from large corpora by neural networks. However, it is hard to calculate the intrinsic relationship between words due to the lack of relations of semantic information; for knowledge-based methods, as the limited amount of terms in the semantic library, it is not suitable for calculating the similarity between large texts. This paper analyzes the advantages and disadvantages of the two methods and presents a method for measuring the semantic similarity using knowledge-based and corpus-based measures. Combining the knowledge-based and the corpus-based methods to compute the semantic similarity is the innovation point of this paper, and the methods achieves better results in word and short text similarity computing.For word similarity calculation, the authors primarily leverage the corpora to train distributed word vector via CBOW model, WordNet is used as the additional knowledge with unambiguous semantics to augment the semantic information of corpora and conduct multiple prototype vectors per word in low-dimensional vector space, and then the similarity of words is calculated by Max-Sim Model. The experiments are carried on three common datasets include RG-65, MC-30 and WS-353.For short text similarity calculation, similarity matrix method is used to calculate the similarity. We describe knowledge-based similarity feature, corpus-based similarity feature and their combination similarity feature of word semantic similarity, and show how they can be used to derive a text-to-text similarity metric. The experimental results on Microsoft Research Paraphrase Corpus show the method is feasible and it achieves higher precision, recall and F1 value compared with the similar models.A web services matchmaking method is proposed based on semantic similarity calculation. Input and output interfaces are calculated by the word similarity method, text description is calculated by the sort text similarity. Three similarity values are then considered to describe the web service similarity. The proposed method is proved to be feasible and effective by a comparative experiment on OWLS-TC dataset.
Keywords/Search Tags:Semantic Similarity Calculation, WordNet, Word Similarity, Short Text Similarity, Web Services Matchmaking
PDF Full Text Request
Related items