A Combined Measure For Text Semantic Similarity

Posted on:2014-06-16

Degree:Master

Type:Thesis

Country:China

Candidate:H D Li

Full Text:PDF

GTID:2268330422951617

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of artificial intelligence and natural languageprocessing, text similarity calculation has become the core module of manyapplications such as semantic disambiguation, information extraction, informationretrieval, text classification, automatic question answering and data mining etc. Thesimilarity measures have been developed from word co-occurrence, grammaticalstructure to the semantic, which pushes the seeking for high accuracy efficientsemantic similarity computing techniques. Most of the existing semantic similarityalgorithms are based on statistical methods or rule based methods that areconducted on ontology dictionaries and some kind of knowledge bases. Wherein therule-based methods usually use the dictionary, the ontology tree or graph, or theco-occurrence number of attributes, while the statistical methods may choose to useor not use a knowledge base. While a statistical method of using a knowledge baseincorporates more comprehensive knowledge and has the capability of reducesknowledge noise, it usually obtains better performance among existing methods.Nevertheless, due to the imbalanced distribution of different items in a knowledgebase, the semantic similarity calculation results for low-frequency words are usuallypoor.To address above issue, this thesis presents a combined measure for semanticsimilarity calculation. At first, we studied existing statistical methods that are basedon ontology dictionary rules and corpus and compared their advantages anddisadvantages. Then the method of combing rules and statistical measures isproposed for word level semantic similarity calculation, which uses English andChinese Wikipedia database and the HowNet semantic dictionary to build the socalled Explicit Semantic Analysis model. To address the sample imbalance issue, animproved algorithm based on stop word distributions is also proposed. For thesentence level semantic similarity computation, the syntactic information, the editdistance and the semantic similarity are combined together to improve theperformance.The combined calculation method proposed in this thesis is verified byexperiments conducted on English and Chinese standard corpus and the best resultsamong all the compared methods are reached. The combined semantic similaritycomputing method can be directly applied to applications such as thegeneral-purpose automatic answering system etc.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Semantic Similarity Measure Method For RDF Graphs
2	Research Of English Sentence Similarity Measure Based On Wordnet
3	Design And Implementation Of Sentence Level And Paragraph Level Semantic Similarity Algorithms
4	Research On The Calculation Method For Semantic Similarity Of Sentence And Its Application
5	Subjective And Objective Combination Of Semantic Similarity Algorithm And Its Application
6	Sentence Similarity Calculation Based On Semantic Role Labeling
7	Chinese Sentence Similarity Based On Semantic Role Labeling
8	Conceptual Semantic Similarity Calculation Based On WordNet And Its Application Research
9	Address Parsing System Based-On Google Map
10	The Design And Implementation Of Multi-features Combination In Sentence Similarity Computation