| With the continuous development of information technology in the era of big data,the privacy of personal information including sensitive data has also attracted more and more attention.How to conduct data release and data analysis on the premise of protecting personal sensitive information from being leaked is a major challenge currently facing.As a new privacy protection model,local differential privacy has strong privacy protection.Compared with centralized differential privacy,it does not rely on any trusted third party and protects the user’s sensitive information locally.It not only protects against attackers with any background knowledge,but also protects against privacy attacks by any untrusted third parties.Under local differential privacy,we first study the one-dimensional density estimation problem,that is,how to reconstruct the distribution over a numerical domain.Existing solutions do not consider prior knowledge about the distribution of the overall data when performing distribution estimation,resulting in low data utility.We propose an incremental learning method based on prior knowledge to reconstruct the distribution.This method includes multiple rounds of collection.The main idea of this method is,in the first round,we first collect some users’ data under the assumption of no prior knowledge and then use the obtained statistical information to estimate the distribution of the dataset.From the second round,we consider using distribution information obtained in previous rounds as the prior knowledge to estimate the distribution of the dataset further accurately.We design two data mapping algorithms to transform the data to be collected by combining prior knowledge,and design a distribution aggregation algorithm to integrate the results of multiple rounds of collection.Experiments show that compared with existing methods,our scheme can significantly improve the accuracy of distribution estimation under the premise of providing users with privacy protection.In addition,we study the heavy hitter identification problem over a discrete domain,which aims to find more frequent items.We analyze the existing solving algorithms and propose a priority queue method to improve it.The main idea of this method is to gradually identify the frequent data prefixes from the fixed-length data by introducing the data structure of the priority queue,and finally identify the frequent items of the complete data.We verify the effectiveness of the scheme through experiments,and the results show that the improved scheme has higher data utility. |