Font Size: a A A

Research On Semantic Similarity Computation And Applications

Posted on:2010-01-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:L SongFull Text:PDF
GTID:1118360302983798Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Similarity is a universal relationship that exists between every two objects, while the degree of similarity(DOS) makes this concept quantitative.Similarity computation is the key issue in the fields of information retrieval,data mining, knowledge management and artificial intelligence.With the wide application of ontologies,the research of semantic similarity and applications based on ontologies has been one of an interdisciplinary research topic of computer science and psychology.In this thesis,based on the theory of information granularity,objects are divided into basic information object and general information object.Concepts belong to the basic information object while the text documents,semi-structured documents and web services belong to general information object.A novel semantic similarity computation approach between concepts is proposed based on fuzzy similarity model and verified by experiments firstly.Based on the research of semantic similarity of concepts,further studies of the semantic similarity between documents,between semi-structured XML documents and between web services are provided and verified by experiments.The research of the thesis enriches the theory of semantics similarity and explores a new way to solve the semantics similarity computation between objects.The main work and contributions of the thesis can be summarized as follows:(1) A new approach to compute semantic similarity between concepts(SSBC) in an ontology is proposed.The concept in an ontology is expanded to a semantic set by considering its structural information and semantic information.Based on the semantically expanded set,the features of the concept are described and a fuzzy set is defined. Next,the semantic fuzzy similarity between two fuzzy sets corresponding to two concepts is computed.SSBC can embody influence of asymmetry,depth and local density of ontology on similarity computation.The proposed approach SSBC is verified by experiments.The experiments consist two parts:Firstly, SSBC is implemented based on WordNet.The experiment results show that SSBC outperforms traditional similarity measures on a commonly used dataset with an improvement of 0.018 in correlation coefficient.Secondly,a semantic similarity measure between sentences(SSBS) is devised based on SSBC and relevant experiments were carried.Compared with other methods,SSBS considers not only semantic similarity and string edit distance between vocabularies but also part of speech tagging.(2) A new approach to compute semantic similarity between text documents (SSBTD) is put forward.A document is firstly described as a feature set of concepts based on domain ontology.Meanwhile,every concept in the feature set is defined by a fuzzy set. Then the fuzzy set of the document is defined through some fuzzy operations. Finally,the semantic similarity between two documents is obtained through computing the similarity between two fuzzy sets.SSBTD can efficiently solve the semantic independent problem between vocabularies in the document's feature description.SSBTD is appropriate for documents with less number of vocabularies such as similarity computation between Deep Web forms.Deep Web form is oriented to different customers and designed independently,which has fewer vocabularies.If two semantic similar words are represented as two different vocabularies,the accurateness of feature description of Deep web database will be affected significantly.Semantic similarity computation between two Deep web forms with SSBTD can improve the accurateness.Experiments results in clustering show that SSBTD performs better than traditional cosine similarity in ASDC(Average Similarity of Document to the Cluster Centroid) and RI(Rand Index).(3) A new approach to compute semantic and structural similarity between XML documents(XMLSim) is proposed.NPathSim,the similarity between paths,is the basis for computing XMLSim. In order to compute NPathSim,a similarity matrix of node tags is created based on semantic similarity and string similarity between node tags.Every node tag is assigned a weight according to its location in path.By analyzing partial relationship among node tags,the problem of similarity between paths is abstracted as Maximal Similar Subsequence(MSS) problem.The result of NPathSim is obtained by the solution of MSS with dynamic programming. Finally,XMLSim is the average of the best NPathSim value among path sets. XMLSim and XSim are used to cluster XML documents respectively.The clustering experiment results show that XMLSim performs better than XSim in purity and RI because both of semantic and structure information are taken into account.(4) Semantic indexing structure and matching algorithms of Web Services are put forward.Firstly,as a kind of description of web service,IOPE(Input Output Precondition Effect) is semantic expanded by adding semantic equivalence concepts based on domain ontology.An effective index mechanism,Bit-Sliced Bloom-Filtered Signature(BBS) is built.Secondly,two kinds of Web service matching measures are proposed:one can be used in keyword matching;the other can be used in input/output parameters matching.Finally,a semantic matching computation measure is put forward.The experiment results show that the BBS approach performs faster and more efficiently than inverted index approach with the increasing of web services.
Keywords/Search Tags:Concept Similarity, Sentence Similarity, Text Document Similarity, XML Document Similarity, Web Service Matching, Deep Web Databases Clustering, XML Documents Clustering
PDF Full Text Request
Related items