Font Size: a A A

Research On Privacy-Preservation Technique Based On Random Projection Data Perturbation

Posted on:2015-04-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:J S ZhaoFull Text:PDF
GTID:1318330518972863Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Due to advances in information processing technology and storage capacity,nowadays huge amount of data is being collected for data mining.During the whole process of data mining the data get exposed to several parties and such an exposure potentially leads to breaches of individual privacy.Data in its original form,however,typically contains sensitive information about individuals,and publishing such data will violate individual privacy.In these situations,the data distributor is often faced with a quandary:on one hand,it is important to protect the anonymity and personal information of individuals.While on the other hand,it is also important to preserve the utility of the data for research.To address this challenging problem,Privacy-preserving data publishing for data mining has emerged as a very active research area.Privacy-preserving data publishing(PPDP)provides methods and tools for publishing useful information while preserving data privacy.Recently,PPDP has received considerable attention in research communities,and many approaches have been proposed for different data publishing scenarios.An effective technique among these approaches is random projection based data perturbation.Random projection based data perturbation can provide higher practicality and reliability since it is simple to implemented and has a sound mathematical foundation.However,there are still some problems to be solved.In this thesis,we focus primarily on publishing and sharing privacy data for data mining.Specifically,according to different problems,we contribute to the research of random projection based data perturbation from several perspectives for some scenarios.Firstly,we examined the privacy preserving properties of random projection based data perturbation technique when projection matrix is leaking.We propose a novel data reconstruction method based on l1 minimization for the situation of leaking projection matrix,and reconstruct data using convex optimization method.Moreover,we theoretically analysis the necessary conditions for accurately data reconstruction.We then design a data reconstruction algorithm based on primal-dual interior point method and implement it using Newton's iterative method.We point out that the sparse original data can be accurately reconstructed with knowing random projection matrix attack in a malicious model,which leads to breaches of user privacy.Experiments on both synthetic and real-world datasets show that our data reconstruction method can reconstruct sparse original data accurately without knowing any original data sample.Our results offered insight into the vulnerabilities of random projection based data perturbation in malicious model.Secondly,to address the issue of assumptions regarding data reconstruction in random projection based data perturbation technique,we propose a novel data perturbation method based on random projection which satisfying differential privacy model by introduced additive random noise.We theoretically prove that the proposed noisy projection perturbation method satisfy differential privacy definition which is a rigorous privacy definition,and protect the relative positions of original data in Euclidean space.Meanwhile,we present a noisy projection data perturbation algorithm which can be applied to differential privacy data publish in collaborative data mining.Experiments on both synthetic and real world dataset show that this perturbation provides higher privacy protection than traditional random projection based data perturbation,but with little loss of accuracy for data mining tasks based on nearest neighbor search.Then,we turns to exploring efficient data perturbation technique for large-scale high dimensional dense data,propose a sparse projection data perturbation method based on universal hash function.In particular,the sparsity of the projection matrix in the proposed method changes with the projection dimensionality.Moreover,we present a data perturbation algorithm which can perturb the original data with the specific data distortion parameters being determined by the user.Meanwhile,we theoretically prove that the proposed method can maintain the data utility and guarantee the security of data privacy.Experiments on both synthetic and real world dataset show that our method can preserve data privacy and meet data utility requirements in several data mining tasks,and significantly reduce the computation costs.Finally,we propose a data perturbation method which allows asynchronously real-time data updates for distributed privacy preserving data stream.We first develop a novel system model for distributed privacy preserving data stream publishing.Then,we designed a privacy data collection mechanism based on the proposed system model and data perturbation method.Meanwhile,we present an algorithm to implement asynchronously data stream perturbation in real-time.To demonstrate the working of the proposed data perturbation method,we adapt it to work for a real life distributed data stream mining application,that is privacy preserving trajectory stream mining.We present an algorithm for transforming real-time trajectory data on client equipment,and an algorithm for mining similar trajectories from perturbed data on sever.Experiments on both synthetic and real world dataset show that the privacy preserving trajectory data transformation algorithm is efficient to satisfy the requirement of real-time updates,and the similar trajectories mining algorithm guarantees valid mining results while taking less running time.
Keywords/Search Tags:Data publishing, Privacy preserving, Data perturbation, Random projection, Sparsity
PDF Full Text Request
Related items