Learning the semantics of structured data sources

Posted on:2016-07-06

Degree:Ph.D

Type:Dissertation

University:University of Southern California

Candidate:Taheriyan, Mohsen

Full Text:PDF

GTID:1478390017477174

Subject:Computer Science

Abstract/Summary:

Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data, however, they rarely provide a semantic model to describe their contents. Semantic models of data sources capture the intended meaning of data sources by mapping them to the concepts and relationships defined by a domain ontology. Such models are the key ingredients to automate many tasks such as source discovery, data integration, and publishing semantic content on the Web. Manually modeling the semantics of data sources requires significant effort and expertise, and although desirable, building these models automatically is a challenging problem. Most of the effort to automatically build semantic models is focused on labeling the data fields (source attributes) with ontology classes and/or properties, e.g., annotating the first column of a table with the class Person and the second one with the class Movie. However, a precise semantic model needs to explicitly represent the relationships between the attributes in addition to their semantic types, e.g., stating that the person is the director of the movie. Automatically constructing such precise models is a difficult task.;We present a novel approach that exploits the knowledge from a domain ontology, the semantic models of previously modeled sources, and the vast amount of data available in the Linked Open Data (LOD) cloud to automatically learn a rich semantic model for a new source. This model represents the semantics of the new source in terms of the concepts and relationships defined by the domain ontology. Given some sample data from the new source, we leverage the knowledge in the domain ontology and either the known semantic models or the LOD cloud to construct a weighted graph that represents the space of plausible semantic models for the new source. Then, we compute the top k candidate semantic models and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic models on future data sources. Our evaluation shows that our method generates expressive semantic models for data sources and services with minimal user input. These precise models make it possible to automatically integrate the data across sources and provide rich support for source discovery and service composition. They also make it possible to automatically publish semantic data into knowledge graphs.

Keywords/Search Tags:

Data, Source, Semantic, Automatically, Domain ontology

Related items

1	Research On Method Of Data Sources Selection And Constructing Domain Ontology
2	The Research Of Building Domain Ontology Semi-Automatically Based On Relational Database
3	Research On Domain Ontology Representation, Reasoning And Integration For The Semantic Web And The Applications
4	Semantic Sharing Of Basic Geological Heterogeneous Data Based On Domain Ontology
5	Research Of Representation And Process Of Semantic Information Of Web Data Based On Ontology
6	Requirements Auto-generat Model For Domain Ontology Evolution Based On Text
7	Focusing Technology Of Deep Web Data Source Based On Domain Ontology
8	Research On Construction And Semantic Retrieval Of Multiple Majors Domain Ontology
9	Applied Research And Prototype Implementation On Domain Ontology Description And Inference Mechanism Under Semantic Web Environment
10	Research On Domain Ontology-based Semantic Retrieval Method And Its Application