Font Size: a A A

Model-Based Data Provenance In Semantic Web Environment

Posted on:2015-01-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:J NiFull Text:PDF
GTID:1108330461988119Subject:Agricultural information management
Abstract/Summary:Request the full-text of this thesis
The current Web environment is expanding and evolving to a vision for the semantic Web, however, the continuous growth of the information resource, the speeding up of the information flow, and frequent data replication and evolution in the transmission process have brought great challenges in information reliability, authenticity and credibility. Linked data is called semantic Web best practices by W3C. It has following characteristics:rapid growth in quantity, uneven quality. In distributed environment, duplication of links is more common and dynamic update will further lead to data inconsistency. The fundamental reason lies in the lack of original information. Therefore, in the semantic Web environment, it becomes an urgent problem that how to use the unified data provenance model to locate and query provenance information, how to identify the authenticity of similar web pages, and how to add provenance metadata in the linked data publication.In this dissertation, we focus on the bottleneck in the semantic Web field, taking the semantic Web, data provenance and linked data as the theoretical basis, using semantic Web applications as the research view, and integrate the method of literature research, investigation, system analysis, comparative study, inductive reasoning and software engineering into research. The creative works are as follows.First, we discuss the provenance vocabularies including DCMI Term, OPM-O, PV, VoIDP and Prov-O, and separately compare and analyze the similarities and differences between them from five dimensions:aim, description, service providing, annatation method and vocabulary structure.Second, we analyze the W3C PROV recommendation standard with its role and status in the Semantic Web architecture, explore the main functions of PROV and explain the concept of the standard. A PROV Web application scenario is built and descriptions about the use case are given. Finally, we summarize the Web application features of PROV including resolvability, semantic and traceability. The purpose is to promote the further study of domestic community in application of this architecture, enable the information provenance traceable in distributed environment and put forward the interchange of provenance record on the Web.In addtion, we study the representation and query service of provenance records in Web application. Based on the analysis of provenance record location, passing mode, way to achieve and implementation patterns, four types of provenance location and two types of query mechanism are summarized. Semantic notation and provenance information presentation technology are used to capture the provenance information in the case of online papers dating back. By HTML language embedded with RDFa, Web page about provenance information ware displayed and visualization of the provenance records are exposed. Finaly, query service of this case are discussed.To solve the problem for lacking of provenance metadata in existing web page, we put forward a method of automatic annotation. Through the analysis of document derivation, we define document as the entity, extract document features with a number of semantic attributes. Semantic similarity clustering method is used to find the relationship during the changes of documents. Feature words variation and responsible person can be found with the aid of PROV-O. Through the attribute recognition of named entity relation extraction, document attributes can be built, linking to the LOD cloud and the changes at finer granularity can be found through the general ontology. Then, taking "genetically modified" news Webpage as test documents, we verify what we have put forward.Finaly, based on the analysis of the characteristics of the linked data publication, we list the necessity of data tracing. After the selction of tracing entities, provenance granularity, development tool, we establish a linked data publishing framework with provenance. Based on D2R Server linked data for the teaching and research field are automaticly constructed, provenance metadata for linked dataset are costumed. We implement the system validation and the Sparql endpoint query, supporting the users for information sharing, effective mining, and timely tracking in their own field.
Keywords/Search Tags:semantic web, data provenance, provenance model, linked data, semantic notation
Request the full-text of this thesis
Related items