| With the rapid development of informatization and digitization in the new era,human society is generating huge amounts of information all the time.Many companies and institutions usually collect and publish this information for third-party research.For example,the incidence trend of a certain type of disease in the region can be inferred from the disease information released by various hospitals in the region.However,these released information may pose the risk of privacy leakage.Therefore,how to protect the privacy of users on the basis of effectively utilizing these information is an important task.Differential privacy is a rigorous,theoretical,and provable privacy-preserving model that can protect privacy while taking into account the availability of data.Differential privacy has been widely used in various fields.Differential privacy is a commonly used data privacy protection method.Because it adopts the method of adding noise to the data to protect the privacy of the data,it will have a certain impact on the validity and availability of the data;so the current histogram publishing algorithm The main optimization goal is to maximize the availability of data on the premise of protecting data privacy.However,the existing algorithms have large errors,resulting in low data availability.Therefore,to analyze and study this problem,this paper proposes a corresponding solution.The specific research contents are as follows:(1)Aiming at the problem of selecting the center point of histogram clustering grouping,a cluster center point selection algorithm(CPSA)is proposed.The method first uses the shortest distance between the non-central point and the central point combined with the index mechanism to calculate each non-central point.The sampling probability of the point,and then use the roulette to draw.Under the premise of satisfying the protection of differential privacy,the distribution of the selected cluster center points in the data is as discrete as possible.(2)Aiming at the problem of low availability of histogram publishing data,a histogram publishing algorithm integrating K-means and Exponential mechanism(IKEM)is proposed according to the actual needs and the above-mentioned cluster center point selection method.The algorithm uses the exponential mechanism combined with the roulette technique to sample and select the cluster center points,so that the distribution of each cluster center point in the histogram data is as discrete as possible;the original histogram data is grouped and divided by using the obtained cluster center points;The divided groups are averaged and Laplacian noise is added to obtain the differential privacy histogram to be published.Simulation experiments are carried out on the algorithm using actual data,and the results show that the algorithm improves the availability of data while ensuring differential privacy.(3)Aiming at the privacy and availability issues in the distribution of dynamic data flow histogram data,a histogram publishing algorithm of dynamic data streams(DSHP)is presented.The algorithm first uses the dichotomy method to allocate the privacy budget to the data units in the sliding window,and then uses the similarity of the published data at two adjacent moments to determine whether to allocate the privacy budget to the data unit at the current moment;finally,the IKEM algorithm is used to allocate the privacy budget to the data unit.The data units are divided into groups,and the differential privacy histogram to be published is obtained.Tests on real datasets show that the DSHP algorithm improves the availability of data on the premise of satisfying-event privacy.(4)An interactive data query system based on differential privacy is designed.The system is developed based on the java language and combined with the existing mainstream frameworks such as Springboot and vue.Using this system,users can query the deprived information they need.System tests show that it is efficient and reliable. |