Rearch On ETL Provenance Method Based On PROV

Posted on:2018-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhangFull Text:PDF
GTID:2428330569998664Subject:Management Science and Engineering
At the age of information explosion,the quantity of data is becoming more and more,but the useful data can be called information.As a result,we need to analyze these data and mine relationship between them using data mining technology.Its preconditions are that data is reliable and its generation is exact,otherwise later work is meaningfulness.Thus,we should make a effective record that how data come to be in its present situation,and we can trace origin data,audit its derivation,reproduce transitions,find new ideas,ensure responsibility and so on using data provenance technology effectively.Due to current huge data with no derivation and reliability,this paper focus on making a detailed analysis and description on transitions and reverse process using PROV,designing appropriate description specification,and improving provenance efficiently as ensuring true provenance results.In this article,the main work can be divided into following parts:Firstly,research on transformation and attribute mappins.Making an analysis on transitions in ETL,we will design related inverse transitions and inverse functions.Moreover,to improve the efficiency of data provenance,we will analyze transformation ruls and conclusion the rules of min attribute set and min attribute mapping set,which can decrease the quatity of data source and tempory results.Secondly,design on inverse algorithm.According to the function requirement of the inverse algorithm,it can be divided into four parts: ETL information access,the construction of provenance tree,min attribute and provenance workflow.Ande we will set an ETL example which has multi-transformation and multilist to verify the inverse algorithm,and will show the whole proess and tempory results.Thirdly,the specification design is described.This paper describes the related information of data tables and transformations by XML specification based on PROV.In this part,the object of the work is divided into three parts: data table and(inverse)transformation information,ETL and data provenance process.Data table and(inverse)transformation information are described by Dublin Core,which can contribute to show basic features;ETL and data provenance are described by PROV,which can contribute to show whole procee.In the aspect of theory,this paper completes the study on the construction of inverse function,the design of data traceability algorithm and the description specification.In the aspect of application,this paper completes the general design of the ETL-data provenance tools.
Keywords/Search Tags:Data Provenance, ETL, Inverse, PROV, Resource Description
