Font Size: a A A

Learning the semantics of structured data sources

Posted on:2016-07-06Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Taheriyan, MohsenFull Text:PDF
GTID:1478390017477174Subject:Computer Science
Abstract/Summary:
Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data, however, they rarely provide a semantic model to describe their contents. Semantic models of data sources capture the intended meaning of data sources by mapping them to the concepts and relationships defined by a domain ontology. Such models are the key ingredients to automate many tasks such as source discovery, data integration, and publishing semantic content on the Web. Manually modeling the semantics of data sources requires significant effort and expertise, and although desirable, building these models automatically is a challenging problem. Most of the effort to automatically build semantic models is focused on labeling the data fields (source attributes) with ontology classes and/or properties, e.g., annotating the first column of a table with the class Person and the second one with the class Movie. However, a precise semantic model needs to explicitly represent the relationships between the attributes in addition to their semantic types, e.g., stating that the person is the director of the movie. Automatically constructing such precise models is a difficult task.;We present a novel approach that exploits the knowledge from a domain ontology, the semantic models of previously modeled sources, and the vast amount of data available in the Linked Open Data (LOD) cloud to automatically learn a rich semantic model for a new source. This model represents the semantics of the new source in terms of the concepts and relationships defined by the domain ontology. Given some sample data from the new source, we leverage the knowledge in the domain ontology and either the known semantic models or the LOD cloud to construct a weighted graph that represents the space of plausible semantic models for the new source. Then, we compute the top k candidate semantic models and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic models on future data sources. Our evaluation shows that our method generates expressive semantic models for data sources and services with minimal user input. These precise models make it possible to automatically integrate the data across sources and provide rich support for source discovery and service composition. They also make it possible to automatically publish semantic data into knowledge graphs.
Keywords/Search Tags:Data, Source, Semantic, Automatically, Domain ontology
Related items