Font Size: a A A

Semantic similarity computation on the web of data

Posted on:2015-04-18Degree:Ph.DType:Dissertation
University:Rensselaer Polytechnic InstituteCandidate:Zheng, Jin GuangFull Text:PDF
GTID:1478390020452407Subject:Computer Science
Abstract/Summary:
Over the last few decades, many efforts have been devoted to researching and developing effective semantic similarity computation algorithms for different scenarios, such as similarity between free text, and similarity between objects. As the result of these efforts, there are many semantic similarity computation algorithms that utilize different information sources, for example, information content based algorithms like the vector space model; ontology based edge counting methods, like semantic similarity methods in WordNet; structure or feature based methods, like Tversky's model.;However, none of the existing algorithms are aimed to solve the similarity computation problem for the entities on the Web of Data. Applying existing similarity computation algorithms for text or words directly on entities on the Web of Data (WoD) would compute an inaccurate similarity score. The reason that these similarity computation algorithms cannot compute the score accurately for entities on the WoD is that they are purely based on text analysis and do not utilize the rich semantic relations and semantic descriptions of the entities during the similarity computation. Semantic similarity computations on entities of the WoD is important, because there are many applications are relying on similarity computation, such as entity matching, entity annotation, and entity ranking.;The primary goal of this study is to investigate how to compute a semantic similarity score among entities on the Web of Data. We design 1) a novel semantic similarity computation model to compute similarity among the entities on the Web of Data and other structured or unstructured data entities. This new similarity computation model leverages the theory of information entropy to determine the amount of meaningful information present in the entity description and thereafter computes the amount of meaningful information shared by the entities. The model uses machine learning to learn and assign appropriate weights to shared or unique information of the entities in order to highlight important and meaningful information. The model also tackles the scalability issue of the similarity computation which is a major challenge given the amount of entities on the Web of Data. To prove the effectiveness of the proposed semantic similarity computation model, we 2) apply the model to develop systems to solve the entity matching problem, and entity annotation problem on the Web of Data. We show that using our model, we improved the current state of the art significantly when solving these problems.
Keywords/Search Tags:Similarity computation, Data, Web, Model, Entities
Related items