Research Of English Sentence Similarity Measure Based On Wordnet

Posted on:2015-06-15

Degree:Master

Type:Thesis

Country:China

Candidate:H N Wang

Full Text:PDF

GTID:2298330431981793

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Text is an important product with development and continuation of human civilization.It records the progress of human society. Text is a written form of language, a text can be asentence, a paragraph, or a chapter. It is a symbol of human wisdom and is one of the mainprimary means of support and knowledge dissemination of information. With the rapiddevelopment of the information society, especially in the Internet era, information resourcesincrease at a rapid pace. How to manage and make good use of these information resources isa topic of concern today. Text mining is a field of study for the realization of the informationresources management. It takes text as the mining objects and finding the implicitknowledgeâ€™s potential value in the information such as structure, model, pattern, etc. It hascontributions in many areas such as information retrieval, pattern recognition and naturallanguage processing.Computing text similarity is a basic and important work in text mining areas. It is one ofthe key technologies of text mining. It is associated with many researches of importantapplications. For example, the detection of text repetition rate, text categorization and textclustering. In the areas such as information retrieval, text similarity measures have a widerange of applications. It is worth further study and discussion. This article will focus on theresearch of similarity between the English sentences using WordNet as the semanticknowledge. We propose a novel method about calculating the similarity between twosentences after analyzing researching the related algorithm. The paper will introduce a newmathematical model in calculating the semantic similarity between words. In the calculationof word order similarity measure, we propose a mathematical model using Hamming distance.Finally, combining the semantic similarity and word order similarity between sentences, thenwe get the overall similarity. In the part of experiment, we use three datasets to validate theproposed algorithm. Two datasets of brief sentence pairs were experimented in the classicsentence similarity algorithm by Li. So we also do the experiment with it, then we couldcompare the result with Liâ€™s algorithm and analyses the advantages and disadvantages of theproposed algorithm. Another dataset we used is MSRP dataset. It is a relatively large-scaledatasets with tags. Doing experiment with it can judge a large number of sentence pairsâ€™similarity and then measure the accuracy of the tags matching. After analyzing the result ofexperiments, we make summary and prospect for the method we proposed in this paper.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Conceptual Semantic Similarity Calculation Based On WordNet And Its Application Research
2	Research On Sentence Semantic Similarity Based On WordNet In Automatic Question Answering System
3	Research On Semantic Similarity Between Words And Between Short Texts Based On WordNet
4	The Research About Text Similarity Measuring Through Hamming-Distance And Semantics
5	Research And Implementation Of Semantic Similarity Computing By Combining Knowledge-based And Corpus-based Methods
6	Address Parsing System Based-On Google Map
7	Research And Application Of Wordnet-Based Semantic Similarity Measurement
8	The Research And Implementation On Wordnet-based Sentence Similarity Of Automatic Question Answering System
9	Research On Semantic Similarity Computation And Applications
10	The Research Of Semantic Similarity Between Short Text Based On WordNet