Semantic similarity computation on the web of data

Posted on:2015-04-18

Degree:Ph.D

Type:Dissertation

University:Rensselaer Polytechnic Institute

Candidate:Zheng, Jin Guang

Full Text:PDF

GTID:1478390020452407

Subject:Computer Science

Abstract/Summary:

Over the last few decades, many efforts have been devoted to researching and developing effective semantic similarity computation algorithms for different scenarios, such as similarity between free text, and similarity between objects. As the result of these efforts, there are many semantic similarity computation algorithms that utilize different information sources, for example, information content based algorithms like the vector space model; ontology based edge counting methods, like semantic similarity methods in WordNet; structure or feature based methods, like Tversky's model.;However, none of the existing algorithms are aimed to solve the similarity computation problem for the entities on the Web of Data. Applying existing similarity computation algorithms for text or words directly on entities on the Web of Data (WoD) would compute an inaccurate similarity score. The reason that these similarity computation algorithms cannot compute the score accurately for entities on the WoD is that they are purely based on text analysis and do not utilize the rich semantic relations and semantic descriptions of the entities during the similarity computation. Semantic similarity computations on entities of the WoD is important, because there are many applications are relying on similarity computation, such as entity matching, entity annotation, and entity ranking.;The primary goal of this study is to investigate how to compute a semantic similarity score among entities on the Web of Data. We design 1) a novel semantic similarity computation model to compute similarity among the entities on the Web of Data and other structured or unstructured data entities. This new similarity computation model leverages the theory of information entropy to determine the amount of meaningful information present in the entity description and thereafter computes the amount of meaningful information shared by the entities. The model uses machine learning to learn and assign appropriate weights to shared or unique information of the entities in order to highlight important and meaningful information. The model also tackles the scalability issue of the similarity computation which is a major challenge given the amount of entities on the Web of Data. To prove the effectiveness of the proposed semantic similarity computation model, we 2) apply the model to develop systems to solve the entity matching problem, and entity annotation problem on the Web of Data. We show that using our model, we improved the current state of the art significantly when solving these problems.

Keywords/Search Tags:

Related items

1	Key Techniques Of Visualization Models System On Military Simulation Entities
2	The Research On Chinese Sentence Similarity Algorithm Based On HNC
3	Evolutionary Computation Based Maximum Similarity Biclustering And Application
4	Performance Optimization Of Distributed Graph Computation Framework Based On BSP Model
5	Research On Computation Method Of Chinese Question Similarity Based On Deep Learning
6	Information Retrieval Research And Implementation Based On The Concept Semantic Similarity Computation Model
7	A vector model of trust to reason about trustworthiness of entities for developing secure systems
8	Study And Application On Chinese Sentence Similarity Computation
9	Research On Similar Sentence Retrieval Technology For Patents
10	Research On Question Similarity Computation In Domain Question Answering System