Font Size: a A A

Data Provenance Sanitization Mechanism Based On Multi-Objective Optimization

Posted on:2020-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y X T OuFull Text:PDF
GTID:2428330572493869Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The extensive use of Internet technology promotes the generation and sharing of large-scale datasets.But it is difficult to guarantee the quality and reliability of datasets due to the variety of data sources.Data provenance is a kind of metadata describing data and is often used for data trustworthiness verification,data historical version management and so on.Data provenance may contain many kinds of sensitive information.Therefore,it is necessary to hide the sensitive information to ensure provenance security before publicly sharing the provenance for the use of the third parties.Provenance sanitization is a new technology to solve the problem of provenance security by hiding or redacting sensitive nodes,edges or even indirect dependencies in a provenance graph.However,exiting research works did not consider the requirements of sanitizing indirect dependencies and did not invent corresponding sanitization mechanisms supporting trade-off between provenance security and utility.Therefore,we explained the necessity of sanitizing indirect dependencies via an example,proposed a basic sanitization mechanism for indirect dependencies and analyzed its shortcomings.We further proposed a sanitization mechanism of data provenance based on multi-objective optimization.The main research contents of this thesis include the following three parts.Firstly,we proposed a basic sanitization mechanism for indirect dependencies.We first exemplified the motivations and analyzed challenges of sanitizing indirect dependencies while keeping utility of provenance sanitization views,and formally defined goals and constraints of sanitizing indirect dependencies.Second,we propose a novel mechanism for sanitizing indirect dependencies on the basis of the "Delete + Repair" mechanism for direct dependencies in literature.The proposed mechanism includes both deletion rules and repairing rules.Deletion rules specify what edges can be deleted for breaking all connected paths among two end nodes of a sensitive indirect dependency while minimizing the sanitization cost;repairing rules specify what uncertain dependencies can be added for improving the utility of the sanitized provenance views harmed by applying deletion rules.Finally,we implement a comprehensive sanitization algorithm for sanitizing indirect dependencies and conduct experiments upon an online open dataset.The experiments results show that the proposed approach can effectively sanitize indirect dependencies while preserving utility of the sanitized provenance view.It is about 30% more effective than the classical sanitization mechanism ProvAbs in provenance utility.Secondly,in order to enable trade-off between provenance security and utility in the basic sanitization mechanism for indirect dependencies,we formally defined the objectives and constraints of data provenance sanitization based on multi-objective optimization.Specifically,we briefly describe the multi-objective requirements of provenance sanitization;we propose a security evaluation model for indirect denpendecies to quantify provenance security;we define a multi-objective trade-off function of provenance security and utility after analysing the interaction mechanism of provenance security and utility,and formally define the objectives and constraints of the problem of data provenance sanitization based on multi-objective optimization.Thirdly,we proposed a data provenance sanitization mechanism based on multi-objective optimization which combines the trade-off between provenance security and utility on the basis of the basic indirect dependence sanitizaion mechanism.The proposed mechanism constructs indirect dependency sanitizaiton strategies collection in the way of "Delete + Anonymize + Repair",and obtains the local optimal solution of data provenance sanitization based on multi-objective optimization.Specifically,we first explain the basic idea of multi-objective data provenance sanitization;we then define sanitization strategies after defining the sanitization primitives and sanitization constraints for indirect denpendencies.After that,we construct two categories of sanitization strategies for indirect dependencies namely “single path” and “multi-path” according to "Delete + Anonymize + Repair".We conduct experiments to compare with the basic sanitization mechanism for indirect dependencies.The experimental results show that the mechanism can effectively sanitization sensitively indirect dependencies,maintain high provenance utility and security for sanitized provenance views.The value of multi-objective trade-off function can reach more than 87%.
Keywords/Search Tags:data provenance, provenance sanitization, indirect dependency, provenance security, provenance utility, multi-objective optimization
PDF Full Text Request
Related items