Font Size: a A A

Research On Evaluation Model And Mechanism Of Data Provenance Sanitization For High Utility

Posted on:2019-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y X WangFull Text:PDF
GTID:2428330548452315Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
While the rapid development of Internet technology has brought great convenience to data generation,data modification and data sharing,it has also make it difficult to guarantee the quality of data.As a kind of metadata,data provenance records the whole evolution process of data from its generation to its extinction.Data provenance can be used to analyze the quality of data,process inversion,the errors investigation and so on.Data provenance may contain sensitive information,so it is necessary to ensure provenance security when exchanging or sharing provenance among different organizations.Provenance sanitization is an emerging technology that enables provenance security by hiding or deleting sensitive information.Existing provenance sanitization research works lack the quantitative assessment for the utility of the sanitization views,only focus on nodes sanitization,and the utility of the sanitization views are low.To adress these issues,we define a model for evaluating the utility of the sanitization views,then propose a provenance sanitization mechanism for high utility.The main research contents of this paper are as follows:First,we extend the PROV model to serve as the theoretical basis for the evaluation and research on mechanisms of data provenance sanitization.Firstly,we formally define the basic concepts such as the provenance graph.Secondly,we define the traceable results based on the connotation of data traceability,and further explore the dependencies between data traceability and provenance sanitization,introduce the fundamental operations of provenance sanitization,and put forward the constraints of provenance sanitization.Then,we define the uncertain dependencies and prove the feasibility of introducing the uncertain dependencies to repair the sanitization views.Finally,we generalize certain dependencies in original PROV model into uncertain dependencies,and propose the repair operation base on this to improve the utility of sanitization view.Second,we clarify the essential connotation of the utility of sanitization view and formalize its definition,construct an evaluation model for the sanitization view utility.The evaluation model provides criteria for measuring the utility of the sanitization view.The evaluation model evaluates the utility of the sanitization view by quantifying the difference between the original provenance graph and the sanitization view.Considering that different sanitization operations get different utility of the provenance elements,we subdivide nodes,edges,and connectivity paths by utility in the sanitization view,and combine the classification result and weights to construct an evaluation model.Furthermore,we design and implement the evaluation algorithm for the utility of sanitization view.The experiment showed that the performance of our algorithm is negative correlated to the size of the provenance graph,and is independent of the size of the difference between the original provenance graph and the sanitization view.Third,based on the extended PROV model,we propose the provenance sanitization mechanism for high utility.Our sanitization mechanism achieves the sanitization of the nodes and the dependencies between nodes.Our novel sanitization mechanism defines deletion and compensation rules for three kinds of nodes and seven kinds of dependencies in the PROV provenance model.We first delete sensitive elements to realize provenance security and then repair provenance graph by introducing uncertain dependencies according to the type of sensitive elements and its context in provenance graph.We design and implement the algorithm of data provenance sanitization for high utility,and verify the effectiveness of our sanitization mechanism.Experiments show that the sanitization view produced by our sanitization mechanism improves by 15.26% compareing to the ProvAbs mechanism,and the performance of our algorithm is higher than the performance of ProvAbs mechanism.
Keywords/Search Tags:data provenance, provenance security, evaluation model, provenance sanitization, utility
PDF Full Text Request
Related items