Font Size: a A A

Differential Privacy Data Publishing Of Associated Attributes Based On Micro-aggregation

Posted on:2021-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:X X YeFull Text:PDF
GTID:2518306308958409Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid advance of the Internet and smart devices has brought great convenience to people's daily life,such as online browsing information,interactive dating,shopping,and entertainment themselves.These activities will generate massive amounts of data which often contain a large amount of personal private information.The direct release of unprocessed data will lead to the leakage of sensitive information of individuals or organizations.Therefore,it needs to be processed before the data is released.Differential privacy is one of the existing effective methods to address the privacy leakage problem,but suffers from problems such as low data utility.To improve data availability and protect individual privacy,anonymization and differential privacy are usually combined to work together.However,most processing methods do not consider the partial dependence between attributes,and focus on processing all quasi-identifier attributes,resulting in high time and space complexity of the algorithm and low data utility.To better address the balance of differential privacy data utility and privacy strength,this thesis investigates differential privacy protection of correlation attributes using micro-aggregation and self-organizing mapping(SOM)networks,called DPPCA(Differential Privacy Protection of Correlations Attributes)and SOMDP(Differential Privacy of SOM),respectively.Firstly,the implementation process of DPPCA model is elaborated:for three different types of data sets,numerical,non-numerical,and hybrid,respectively,the attribute pairs with the largest dependencies are identified;micro-aggregation is performed on these attribute pairs and each cluster size is required to be k(k>2)with l(l<k)different values of sensitive attribute values within the cluster,and then noise is added to each cluster to make it satisfy ?-differential privacy.Secondly,the implementation process and the advantages and disadvantages of the SOMDP model are described:in response to the main defects of existing micro-aggregation methods,SOM network clustering exhibits advantages such as self-organized learning and strong noise immunity,and the combination of differential privacy model can bring higher data utility.Meanwhile,the feasibility of the model is demonstrated from a theoretical perspective.Finally,this thesis experimentally demonstrates the effectiveness and usability of the two models and compares and analyzes the advantages and disadvantages of the two schemes.The results show that DPPCA is more suitable for actual scenarios and can greatly reduce the noise required to achieve differential privacy.The amount of noise added on the Census and Adult data sets can be reduced by 11%,effectively improving the effectiveness of data release;In contrast,SOMDP can improve the data utility but has a more limited application scenario.Future research needs to consider the actual situation of big data,large-scale,and large-scale computing,which is undoubtedly a new challenge for differential privacy research.Figure 13 table 5 reference 68...
Keywords/Search Tags:?-differential privacy, mutual information, associated attributes, micro-aggregation, privacy-preserving data publishing, data utility
PDF Full Text Request
Related items