Font Size: a A A

Research Of Privacy Preserving Location-Based Statistical Data Pubulishing

Posted on:2015-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:W B LiFull Text:PDF
GTID:2268330431954463Subject:E-commerce and information technology
Abstract/Summary:PDF Full Text Request
With the development of positioning technique and smart mobile devices, the location-based applications are widely used in people’s daily lives, such as real-time navigation, personalized information push, mobile social network. Meanwhile, statistics data generated by these applications also provide assistance for the government and public services. For example, the collected user data in the navigation system can help relevant departments in urban road construction. And user behavior pattern analysis can help to detect diseases diffusion law, prevent and control infectious diseases. Since these data involve users’sensitive information, the privacy protection issue needs to be concerned in the statistical data publishing process.Differential privacy has recently become the main stream standard for privacy preserving data release, as it is capable of providing strong worst-case privacy guarantees. Differential privacy requires that the output of a data analysis mechanism be approximately the same, even if any single tuple in the input database is arbitrarily added or removed. The existing location-based statistical data publication approaches perform a recursive binary partitioning of the data domain under tree structure and grids based on hierarchy of geographic location, then add independent noise to each blocks to satisfy differential privacy. In the publishing, the fineness of the partition restricts accuracy of statistics, the amount of partitioned blocks affects noise magnitude. Both of them determine the utility of published statistical data. The current methods partition location-based data in a top-down way, which affects the accuracy of statistical data and is unsuitable for the uneven data. To take advantage of data distribution under differential privacy, we propose a bottom-up data aggregation approach as well as a statistical data publishing scheme satisfying differential privacy. The contributions of this paper are as follows:1. We propose a bottom-up data merging approach, which publishes location-based statistical data satisfying differential privacy. Firstly, we map the data to a two-dimensional geographic space and divide the space into a series of rectangular cells, which are the basic data unit for statistical queries. Then we merge cells that satisfy combination principles and connectivity requirement, into blocks iteratively so that we got fewer blocks and the data distribution in them is uniform. After adding noise into these blocks, we release the statistic data in block units, achieving accuracy location-based statistical data publishing under differential privacy.2. We propose two cells merging approach:global threshold based and local threshold based methods. We introduce the concept of equivalent cells, all cells will be divided into different equivalence classes, and merge equivalent adjacent cells. The global threshold-based cells merging method selects multiple thresholds based on the overall distribution characteristics of the data, the cells between two successive thresholds are deemed equivalent cells. The local threshold-based data merging method divides cells into equivalence classes based on local characteristics of data only, in the other words, their nearby cells.3. We study the errors in the location-based statistical data publishing process, analyze the relationship between noise error and non-uniformity error, give the number of blocks after mergence that minimizes the upper bound of total error in the worst case. Through experiments on widely used Landmark and Checkin Datasets, we indicate that the bottom-up data merging method is more suitable for the non-uniform data, and behave better than top-down approach in the case of small query size.
Keywords/Search Tags:location-based statistical data publishing, differential privacy, data merging
PDF Full Text Request
Related items