Font Size: a A A

Perturbed Data Publishing With Local Differential Privacy Constraints

Posted on:2020-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:H M ZhuFull Text:PDF
GTID:2428330575471917Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,convenient online shopping and personalized news recommendations have gradually become an indispensable part of people's daily lives.While enjoying these services,the information data of many users is continuously collected,used and distributed by enterprises and organizations.On the one hand,enterprises and organizations can use this data to more fully understand users.On the other hand,these published data may leak user's private information,so they need to be privatized before data is released to avoid revealing user sensitive information.Post-random response is one of the effective methods for data privacy protection.The existing research work in this area mainly considers how to design the disturbance matrix and assumes that the dataset attributes are independent or completely related.If the dataset attributes are completely independent,the relationship between them will be destroyed by the disturbance,thus reducing data utility.If the attributes are completely related,the data contingency table is sparse and the computational complexity is too high.In order to solve this problem,this paper proposes a perturbed data publishing algorithm with local differential privacy constraints,which mainly studies how to reduce the risk of privacy leakage caused by refactoring attacks when there is a dependency between sensitive attributes and partial quasi-identifier attributes.Firstly,this paper divides the identifier attribute according to the dependency degree between the quasi-ldentifier attribute and the sensitive attribute,and uses the mutual information theory to find the quasi-identifier attribute with strong dependence on the sensitive attribute in the original data set.which provides a theoretical basis for accurately perturbing the data attributes.Secondly,for the correlated attributes and the non-correlated attributes,the invariant random response method is applied to perturb a certain data attribute or a combination of data attributes to satisfy the local ?-differential privacy requirement.Theoretical analysis of the impact of data perturbations on privacy leakage probability and data utility is also conducted.Finally,Extensive experiments were carried out using the Adult dataset published in the UCI machine learning library,the validity of the proposed algorithm and the ability to process incremental data are validated by data distribution methods such as KL-difference and decision tree classification accuracy.Through theoretical analysis and experimental verification,the algorithm has higher privacy protection level and better data utility than traditional random perturbation algorithm.Figure[14]table[7]reference[48]...
Keywords/Search Tags:local differential privacy, invariant post-random response, data reconstruction, data perturbation, privacy protection
PDF Full Text Request
Related items