Font Size: a A A

Research On Semantic Similarity Measurement For Text

Posted on:2013-06-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Z LiuFull Text:PDF
GTID:1228330395467924Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of computer and Internet technology, the quantity of information grow substantially, this kind of information is difficult to understand and use for computer, and the semantic computing is the basic to solve these problem. The existing semantic computing research mostly rely on large-scale corpus as well as the compete Ontology, these kind of information is difficult to obtain in practical applications; also the past research is conducted in different period, under different research premise, it has not formed the unified theory. In view of the above problems, this paper studies semantic similarity measure of text with concept, sentence and document levels under incomplete information background, and the similarity computing process includs semantic extraction, semantic description and semantic computation of three stages. The relations between semantic objects and Ontology is extracted based on the Ontology structure, using semantic" fingerprint" of the semantic objects in the Ontology to describe the objects themselves, and then forming semantic vector for the semantic objects, thus semantic computation is conducted.Research includes the following three aspects:1. Research on concept similarity measurement based on the tree structure and the tree based graph structure. Through the observation of the tree structure, we found the ancestor Concept Node and the descendant Concept Node of a Concept Node are semantically related to the Concept Node in the Ontology, the structure information of the position of the node in Ontology can use to compute Concept Node Density, concept semantic extraction, semantic description and semantic computation method are proposed based on that. Based on the tree based similarity measurement, we propose the semantic relativity measurement based on the tree based gragh Ontology structure. For the need of computation, tree based graph structure is transformed into a tree structure. Except the method is well applied in domain data, it is also applied to WordNet, experiments show that:Compared with the related method, the method obtains the very good Pearson linear correlation coefficient value under the incomplete information background.2. Research on sentence similarity computing based on Ontology. Using the relations between the Ontology concepts and key words in the sentences to establish semantic index to extract the direct and indirect semantic relation, Ontology based semantic vector is represented to calculate the semantic similarity between sentences, thus the sentence similarity computing method is proposed. This method is applied in the Microsoft Research Institute of paraphrase corpus (MSRP), experiments show that: Compared with the related similarity computing methods, this method obtains good accuracy and recall rate in the incomplete additional information background.3. Research on text similarity computing based on Ontology. In addition to using the relations between the Ontology concepts and key words in the document to establish semantic index to extract the direct and indirect semantic relation, also using Ontology hierarchy information to do the text key words weight estimation, Ontology based semantic vector is proposed to calculate the semantic similarity between texts. This method is applied in the Michael D. LEE50standard document similarity testing data, experiment shows that: Compared with related methods, this method achieves good Pearson linear correlation coefficient value under the incomplete information background.In summary, the common advantage of above three methods is that they require little additional information, and they are simple and effective, so they have good domain adaptbility.
Keywords/Search Tags:Concept Similarity, Sentence Similarity, Text Similarity, SemanticSimilarity Computing
PDF Full Text Request
Related items