| While location-based big data services are providing unprecedented convenience to the public,they also cause many problems such as data abuse,personal privacy leakage,and business secret infringement.With the help of the differential privacy model,adding noise to the statistical results of big data within the corresponding range of the partition structure can not only reduce the potential privacy risks but also better maintain the statistical characteristics of the published results.As an important method of data mining,clustering is especially suitable for extracting and analyzing location-based big data distribution characteristics.In this thesis,the grid clustering method is applied to the location-based statistical publishing process of big data,and the differential privacy model is combined to realize location-based big data statistical partitioning publishing and privacy protection.Aiming at the static publishing scenario of location-based big data,this thesis proposed a grid clustering and differential privacy protection method.Firstly,the 2D space where the location information is located is divided into underlying grids of equal size,and each grid is regarded as a unit to realize the calculation of big data statistical information and all the grids form a grid density matrix.Then,the uniformity of the non-empty grids is judged according to the specific distribution of the data,and the density of the uniformly distributed grid is graded with the help of the discrete wavelet transform.Finally,the neighborhood similarity grids at the same density level are clustered,and the Laplacian noise is incorporated into the statistical results according to the differential privacy model to form the published statistics.Experiments and analyses on real-world datasets show that the grid clustering and differential privacy publishing methods proposed in this thesis are superior to other existing partition publishing methods in terms of range querying accuracy and algorithm operating efficiency.To further realize reasonable dynamic statistical publishing of big data and improve the availability of statistical results of location-based big data,a differential privacy publishing method based on adaptive sampling and grid clustering adjustment is proposed.An adaptive sampling mechanism combining PID control strategy and data change difference is designed to realize the dynamic adjustment of data publishing interval.Based on the highly spatiotemporal correlation features of data distribution at adjacent publishing times,a grid clustering adjustment method for statistical partitioning structure is designed,which significantly improves the execution efficiency of the big data statistical publishing algorithm.The traditional budget allocation strategies are improved to propose the sliding window-based differential privacy protection method for big data statistical publishing,which can improve the availability of published data based on realizing continuous statistical data publishing and privacy protection.Experiments and analysis on real-world datasets show that the dynamic statistical publishing privacy protection method proposed in this thesis is superior to other existing methods in terms of the rationality of data publishing time,the availability of published data,and the efficiency of algorithm operation. |