Font Size: a A A

Data Collection And Statistics With Local Differential Privacy Protection

Posted on:2021-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:L ShuFull Text:PDF
GTID:2428330605456879Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In order to deal with the leakage of user privacy information,many scholars and institutions have proposed different privacy protection technologies.Traditional anonymized privacy protection technology has been difficult to meet users' needs for their own privacy protection.At present,differential privacy protection technology is recognized as the most rigorous and effective privacy protection technology.In order to achieve the goal of third-party users obtaining their personal information,local differential privacy ensures the security of user data on the user side,and has become the most popular research field.This article has made a detailed description of the privacy protection system,anonymized privacy protection technology,centralized differential privacy,local differential privacy and related theoretical concepts.In addition,the process and principle of RAPPOR algorithm based on local differential privacy are studied more deeply,and its shortcomings are found,and an improved algorithm is proposed.In the RAPPOR algorithm,the user terminal converts the data according to the Bloom filter principle,then uses the random response technology to disturb,and finally decodes,corrects,and counts the frequency at the data collection terminal.Because the frequency difference of each attribute value of the data is large,the frequency error of each attribute value collected by the RAPPOR algorithm is large,and even some low-frequency attribute values are lost.In addition,the pass-through cost passed after the user data conversion is also greater,and the regression calculation in the algorithm also increases the error.Aiming at the shortcomings of RAPPOR algorithm,this paper uses K-means clustering algorithm to classify data attribute values,and then collect statistics according to the corresponding categories.Then,the data lossless compression method is used to perform lossless compression on the user-converted 0 and 1 bit vectors to reduce the communication cost.In addition,we adjusted the algorithm process so that no regression calculation was required in the later stage.Finally,the Adult data set in the UCI database is used for experiments.According to the KL-divergence and cosine similarity,the availability of the data is compared under different numbers of grouped data sets and different privacy budgets,and then displayed according to the compression rate.The degree of compression of the communication cost.It can be seen from the experimental results that the improved KC-RAPPOR algorithm has higher statistical data availability and lower communication cost.Figure[20]table[7]reference[53]...
Keywords/Search Tags:Privacy protection, local differential privacy, data lossless compression method, K-means algorithm
PDF Full Text Request
Related items