Font Size: a A A

Research Of English Sentence Similarity Measure Based On Wordnet

Posted on:2015-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:H N WangFull Text:PDF
GTID:2298330431981793Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text is an important product with development and continuation of human civilization.It records the progress of human society. Text is a written form of language, a text can be asentence, a paragraph, or a chapter. It is a symbol of human wisdom and is one of the mainprimary means of support and knowledge dissemination of information. With the rapiddevelopment of the information society, especially in the Internet era, information resourcesincrease at a rapid pace. How to manage and make good use of these information resources isa topic of concern today. Text mining is a field of study for the realization of the informationresources management. It takes text as the mining objects and finding the implicitknowledge’s potential value in the information such as structure, model, pattern, etc. It hascontributions in many areas such as information retrieval, pattern recognition and naturallanguage processing.Computing text similarity is a basic and important work in text mining areas. It is one ofthe key technologies of text mining. It is associated with many researches of importantapplications. For example, the detection of text repetition rate, text categorization and textclustering. In the areas such as information retrieval, text similarity measures have a widerange of applications. It is worth further study and discussion. This article will focus on theresearch of similarity between the English sentences using WordNet as the semanticknowledge. We propose a novel method about calculating the similarity between twosentences after analyzing researching the related algorithm. The paper will introduce a newmathematical model in calculating the semantic similarity between words. In the calculationof word order similarity measure, we propose a mathematical model using Hamming distance.Finally, combining the semantic similarity and word order similarity between sentences, thenwe get the overall similarity. In the part of experiment, we use three datasets to validate theproposed algorithm. Two datasets of brief sentence pairs were experimented in the classicsentence similarity algorithm by Li. So we also do the experiment with it, then we couldcompare the result with Li’s algorithm and analyses the advantages and disadvantages of theproposed algorithm. Another dataset we used is MSRP dataset. It is a relatively large-scaledatasets with tags. Doing experiment with it can judge a large number of sentence pairs’similarity and then measure the accuracy of the tags matching. After analyzing the result ofexperiments, we make summary and prospect for the method we proposed in this paper.
Keywords/Search Tags:Sentence Similarity, WordNet, Hamming Distance, Semantic Similarity, WordOrder Similarity
PDF Full Text Request
Related items