Font Size: a A A

Research On Data Publishing Technology Based On Differential Privacy

Posted on:2020-02-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y W NieFull Text:PDF
GTID:1368330575466580Subject:Information security
Abstract/Summary:PDF Full Text Request
With rapid development and popularization of Internet,cloud computing and artificial intelligence,data collecting and sharing are gradually frequent,making privacy issues severe.Under this circumstance,on one hand,it is practically meaningful to design and improve strategies based on formalized definition of data privacy for protecting privacy.On the other hand,the utility of privatized data cannot be ignored,since the final goal of protecting privacy is to exploit data and make profit,no matter economically or technically.Therefore,the key to technical research on privacy,is to propose rational and feasible data processing schemes,mitigating the contradiction between two aforementioned aspects and realizing win-win.This dissertation focuses on data publishing,taking differential privacy,which is a theoretically rigorous and practically feasible definition,as basis,to study privacy-preserving utility optimization strategies from various aspects.Throughout the study of privacy protection over last decades,plenty of theoretical results and practical applications accumulate and make contributions;but with non-stop digitalized trends,the whole area still faces many challenges.Challenge 1.From the aspect of data types,continuous data contain abundant information;it plays an important role in many areas,such as traffic flow analysis,the study of population migration and city environmental monitoring.However,this kind of data stream,aggregating from real-time personal data,would threaten individual privacy if are not treated correctly,such as extracting pattern of people's daily routine;and the relevant study is lacking.In that case,how to balance personal privacy and the value of continuous information,is a question to answer.Challenge 2.From the angle of data providers,in real-world scenarios,they may have different attitudes to privacy,even for the same kind data.Conservative people generally wish to take a high protective level for safe;but adventurous people usually prefer to lower levels,sharing high-quality data in exchange of more accurate services.Most of current privacy schemes only take privacy levels as unified background parameters,and rely on data characteristics,like distributions and types,to achieve utility optimization,rarely considering the personalized requirement for protection.Hence,realizing individuals' independent choices for privacy and satisfying optimization demand for data quality under this circumstance is a challengeable problem for practically fulfilling privacy protection.Challenge 3.From the angle of data collectors/users,the content of data publishing needs to match their applicative demands.Some relatively direct statistics,like mean and median,does not dig deep enough for data inner rules and form effective models or tools for solving problems.In the privacy-preserving model led by users,recent works that relate to data modeling and training are only a few,there are still vacancy that need to be filled.In addition,in some data publishing scenarios,the subjects are not just individuals;it may include both individual providers and trustful institutes.In other words,not only for a single data formation,for multiple formations,building unified privatized learning model is also a challenge to cope with.To handle aforementioned three challenges,we make a research from three aspects;Firstly,we propose a dynamic privacy protection method for continuous data publishing.This is a method that can deal with range query over 2-dimensional space data.In the space level,it adaptively optimizes partition granularity of space according to the density of a region,achieving the error minimization on query results.In the time level,we take a sliding window as a protection unit,and allocate privacy appropriately by using data change rules,ensuring overall data quality over a time sequence.Through these processes,we could reach a stable balance between privacy and utility.Secondly,we design a utility-optimized discrete histogram estimation method over multilevel private data.For users,the privacy protection in this method is multilevel designed,she/he has an option to choose protective strength,which to some extent,realizes individual control over personal privacy.For estimation quality,our method makes optimization from two aspects.On estimated results,it combines multilevel histograms with optimal weights under privacy constraints;on data specimen,by deriving target private versions from other privacy levels,it enlarges the sample size of target levels,improving the data efficiency.These two strategies independently achieve optimization from different aspects;and by theoretical demonstration and experimental comparison,our methods have remarkably good performances on error reduction.Thirdly,we give classification training methods over heterogeneous settings.Aiming at widely used Naive Bayes classifier,we design corresponding model training strategies under different data privatization scenarios.In user-lead local private setting,we not only show a training method for partially private data(the data with private features but public labels),but also give a communication-efficient scheme for totally private data(features and labels are both private).Moreover,we define a mixture private setting,in which we realize a unified training strategy over different data formations.Based on comprehensive theoretical and experimental validation,private classifiers trained by our methods all achieve statistical unbiasedness and effectiveness on classification tasks.To sum up,this dissertation conducts studies from three aspects to solve challenges existing in private data publishing,and provide some new ideas and approaches for this area.
Keywords/Search Tags:Data Privacy, Data Publishing, Data Stream, Classification, Error Optimization
PDF Full Text Request
Related items