Font Size: a A A

Research On Multi-grouping Abstraction Based Mechanism For Provenance Sanitization With High Utility

Posted on:2021-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y XuFull Text:PDF
GTID:2428330602489834Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the era of big data,massive data is generated,transmitted,stored,transformed,and utilized by different organizations.Data provenance records the data entities,processing processes,and related people and organizations involved in the entire life cycle of data from its generation to its extinction.Data provenance may include various sensitive information,which should be hidden before publishing or sharing the data provenance.Data provenance sanitization is a novel technique to get a secure provenance graph by hiding or redacting sensitive or redundant information in the original provenance graph.Existing researches paid little attention to the evaluation of the provenance utility.Existing abstract sanitization mechanism often leads to low provenance utility and existing provenance utility evaluation models are simple and unreliable.This thesis contructs a data provenance utility evaluation model based on relative entropy,and proposes a mechanism based on multi-grouping abstraction for provenance sanitization with high utility.The main research contents of this thesis are as follows.First,a provenance utility evaluation model based on relative entropy is constructed.It quantitatively defines a Markov chain based causal influence among the data item and the node influenceing it in provenance graph.Then it formalizes the results of provenance tracing as a distribution of causal influences.It then defines the provenance utility of a sanitized provenance graph as the relative entropy between the causal influence distribution obtained from the original provenance graph and the one obtained from the sanitized provenance graph.A open provenance dataset from Indiana University is used to validate the performance of the proposed model.The experimental results show that the proposed model can produce evaluation results as expected.Second,this thesis clarifies the basic principle of enhancing the provenance utility based on multi-grouping abstraction.It analyzes the existing abstraction-based sanitization mechanism in detail and finds out that the over sanitization of non-sensitive nodes is the key reason of low provenance utility.A basic idea of preserving provenance utility by multi-grouping abstraction is proposed.Experiments are conducted to analyze the effect of different grouping schemes on provenance utility.The experimental results show that multi-grouping abstraction is a feasible solution to improve the provenance utility of the sanitized provenance graph.However,an optimal grouping scheme should be identified based on the goals of provenance utility and security.Third,a mechanism based on multi-grouping abstraction for provenance sanitization with high utility is proposed,in order to construct an appropriate grouping scheme to achieve the balance between provenance security and utility.Specifically,it uses the K-Means clustering algorithm to identify the optimal grouping scheme,and then executes the abstraction-based sanitization mechanism and finally obtains a sanitized provenance graph.Experimental results show that the proposed mechanism can identify the optimal grouping scheme and produce secure provenance graph with high utility.
Keywords/Search Tags:Data provenance, Provenance sanitization, Multi-grouping abstraction, Provenance utility, Relative entropy
PDF Full Text Request
Related items