Font Size: a A A

Research On The Key Technology Of Data Provenance Based On Big Data Model Analysis Platform

Posted on:2018-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:P F HaoFull Text:PDF
GTID:2348330515451692Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of computers and mobile Internet,a variety of information was explosive growth.People now have not only concerned about the data itself,but also concerned about the source of data and historical evolution of information.We call these historical information as provenance information.The data provenance has been a very wide range of applications in the database and scientific research,which include a lot of provenance information systems.In addition,data provenance plays an important role in many other areas,such as debugging data and transforming,auditing,evaluating the quality and trust of data,and achieving access control to data.But in the big data platform,whether it is source data or the results of data,they are all stored in the HDFS.The traditional provenance method is not suitable.In order to solve the above problems,this thesis focuses on the data provenance problem of different granularity in the workflow of model under the big data platform,and designs a data provenance system based on the platform.Users can follow the target result data to trace the source of data and impliment the traceability of the model platform.At the same time,they can be able to trace the source to ensure data quality.The main aspects of research are as follows:First,the study is to the methods of coarse-grained data provenance.In this section,a provenance metadata model of the model workflow is designed and constructed,which is based on the graph of the model flows and DAG is used as the description language,and a coarse-grained data provenance method is proposed to solve the problem of the source and evolution of the model workflow data.We call the method as coarse-grained provenance methods.But the provenance method can only trace the data on the directory file level,and can not solve the problem of dependent attribution of the data item in the file.Second,the thesis can further solve the problem of the data item dependency in the provenance of single data,and propose a fine-grained provenance method which can reach each data item in the source files.The method is applied to modify and extend the big data frame of the original ecology,and the provenance mark is introduced to realize the automatic capture and preservation of the provenance information during the execution of the models.Aiming at the tracing model,the thesis designs the forward and backward provenance tracing algorithms.Finally,this thesis designs and realizes the system according to the above research contents,and shows the effect through one example and experiment.At the same time,it proves the effectiveness of the proposed method and achieves the expected design goals.
Keywords/Search Tags:The big data platform, Data provenance, DAG, Fine-grained
PDF Full Text Request
Related items