Font Size: a A A

Research On Collecting Multi-Dimensional Data Under Local Differential Privacy

Posted on:2022-02-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Y YangFull Text:PDF
GTID:1488306350988719Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technologies such as the mobile Internet and the increasing popularity of communication devices such as smartphones,human society has entered the era of big data.Data collection is an important means of obtaining big data.Through collecting and analyzing users’ data,data aggregators(service providers)can mine user-group and personal characteristics to improve the user experience and design more appropriate development strategies.However,since users’data often contain a large amount of sensitive information,directly collecting their data may lead to serious privacy leakage issues.Privacypreserving data collection technologies provide a feasible solution for privacy leakage issues during the process of collecting data.In recent years,local differential privacy has come to be the de facto standard for individual privacy protection.However,existing research works mainly focus on onedimensional data collection under local differential privacy.Research on collecting multi-dimensional data under local differential privacy is still in its infancy.Therefore,this thesis conducts in-depth research on the problem of collecting multi-dimensional data under local differential privacy.In particular,there are three typical types of multi-dimensional data including preference rankings,individual trajectories and multiattribute data considered in this thesis.To summarize,this thesis makes the following contributions:(1)To solve the problem of collecting preference rankings under local differential privacy,a novel approach named SAFARI is proposed.Its main idea is to collect a set of distributions over small domains which are carefully chosen based on the riffle independent model to approximate the overall distribution of users’ rankings,and then generate a synthetic ranking dataset from the obtained distributions.By working on small domains instead of a large domain,SAFARI can significantly reduce the magnitude of added noise.In particular,in SAFARI,two transformation rules are designed to instruct users to transform their data to provide the information about the distributions of small domains.Moreover,a new locally differentially private method for frequency estimation over multiple attributes that have small domains is proposed.Extensive experiments on real datasets confirm the effectiveness of SAFARI.(2)To solve the problem of collecting individual trajectories under local differential privacy,a novel approach,which is referred to as PrivTC,is proposed.In PrivTC,a locally differentially private grid construction method is firstly designed to instruct the aggregator to lay an appropriate grid on the given geospatial domain.Then,a locally differentially private spectral learning method is designed to help the aggregator learn the Hidden Markov Model(HMM)from users’ trajectories discretized by the constructed grid.Finally,the aggregator generates a synthetic trajectory dataset as a surrogate for the original one from the learned HMM.Extensive experiments on real datasets confirm the effectiveness of PrivTC.(3)To solve the problem of collecting multi-attribute data under local differential privacy,a novel approach called HDG is proposed.In particular,the data obtained by HDG are used to support a typical kind of analysis task over multi-attribute data,i.e.,answering multi-dimensional range queries.The main idea of HDG is to use binning to partition the onedimensional domains(1-D)of all individual attributes into 1-D grids and the two-dimensional(2-D)domains of all attribute pairs into 2-D grids,and then combine information from 1-D and 2-D grids to answer range queries.To make HDG consistently effective,a guideline for properly choosing granularities of grids based on the analysis of how different sources of errors are impacted by these choices is provided.Extensive experiments conducted on real and synthetic datasets show that HDG can give a significant improvement over the existing approaches.
Keywords/Search Tags:data collection, multi-dimensional data, local differential privacy
PDF Full Text Request
Related items