Font Size: a A A

High-dimensional Data Publishing Algorithms Based On Local Differential Priacy

Posted on:2020-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y B WangFull Text:PDF
GTID:2428330620954306Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,computer technology has become more and more widely used in people's lives,and has become an integral part of human life.In the context of the era of big data,massive data contains fragmented information in various fields in the real world,which has immeasurable potential value.With the huge potential value of big data being tapped,it has provided convenience for human life.Preference analysis and accurate push make people's choices convenient and simple.However,in recent years,privacy leaks and big data killing incidents have emerged endlessly,sounding the alarm for the technology industry based on the development of big data.For the purpose of protecting user privacy,researchers in the industry have proposed a differential privacy protection model.Differential privacy makes it impossible for an attacker to accurately identify which record the victim's information comes from,which is an effective privacy protection model.Based on the above background,this dissertation uses a local differential privacy protection model to implement local differential privacy protection based on the RAPPOR algorithm,and explores the size distribution of the data attribute domain and the selection and combination relationship between different hash functions while implementing the local differential privacy model..For data sets with different characteristics,the hash function combination with the least conflict is selected.When the number of hash functions is 2,and the mmh3 and FNV functions are used,the conflict for numerical data is the smallest.In the process of adding disturbances,the occurrence of random disturbances is reduced,and both the protection of privacy and the availability of data are guaranteed.At the same time,the development of massive data is not only reflectedby the number of users,but also the increasement in user data attributes,and the high dimensions of data will bring dimensional disasters,which will make it difficult to restore and recover data later.In view of this consideration,while implementing localized differential privacy,this dissertation implements a dimensionality reduction algorithm based on joint probability distribution estimation to reduce the impact of dimensional disasters.Compared with the four methods of principal component analysis,linear discriminant analysis,factor analysis,and Bayesian network,the proposed algorithms t are more effective in current machine learning.Through multiple sets of comparative experiments with multiple methods,we explore different dimensionality reduction methods.Next,the availability of data after local differential privacy processing is analyzed,and the advantages and disadvantages of different methods are analyzed when reducing the dimension of data after local differential privacy protection.The experimental results show that the dimensionality reduction method based on probability distribution estimation combined with differential privacy achieves privacy protection of the data.The comparison with other methods has verified the availability of the data and proved that the method used in this dissertation is feasible and effective.
Keywords/Search Tags:big data, localized differential privacy, privacy protection, joint probability distribution, dimensionality reduction
PDF Full Text Request
Related items