Font Size: a A A

Privacy Protection And Its Key Technologies In Big Data

Posted on:2018-11-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:X T WuFull Text:PDF
GTID:1318330512990803Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information and network technology,big data has been a hot topic in both the academic and industrial research.The rapid generation of data presents both opportunities and challenges for human society.On one hand,it brings economic and social benefits via big data-based processing,analysis and sharing.On the other hand,privacy related with big data has been considered as one of the greatest problems in many applications.Once malicious people obtain sensitive information of person or organizations,they may utilize this information for fraud.Person or organizations possibly are in trouble or undertake economic loss.Due to the special characterizes of big data,such as volume,volume,velocity and value,as well as dynamic nature of privacy,it is very hard to protect privacy in the era of big data.In big data,there are some new challenges for privacy preservation:1)The traditional and passive privacy preservation needs to be changed to be suited to 'big data.In the previous methods,data generators don't actively participate in the process of privacy preservation.If attackers directly obtain the initial data from the databases of data collectors,the passive privacy preservation is not useful.2)In data application phase,multi-source data fusion greatly increases the risk of privacy leakage.Since data of person and his correlated person is widely distributed,the relevance between different datasets greatly increases.In this case,the privacy risk after multi-source data fusion greatly increases.3)There is short of the recovery measures for loss due to privacy leakage.Even if data collectors provide enough privacy guarantees,privacy leakage still happens.In view of these challenges,we propose our solutions for privacy preservation in big data.Specifically,the main contributions of our work are listed as follows.1)To achieve privacy preservation in the life cycle of big data,a privacy preservation framework in big data is proposed.According to the life cycle of big data,the framework is divided into three sections,which are data generation and application phase,and recovery phase for privacy leakage.In data generation phase,data generators are able to anonymize data before data is submitted to data collectors.Furthermore,we discuss multiple data generators to protect privacy together,so as to reduce the cost.In data application phase,each data collector aims at maximizing the utility,subjected to privacy constraints.In recovery phase for privacy leakage,we use cycle insurance as a method for privacy risk management and reduce the loss of data generators and collectors.2)In data generation phase,data generators are able to anonymize data before data is submitted to data collectors.In location-based services,a common method is to let a user generate dummy trajectories,which ensures the location privacy of a lot of users in a small area.However,due to the high cost of generating dummy trajectories,it is not reasonable for only one user to undertake the cost.We study the cost sharing problem to determine which user to generate dummy trajectories and receive the payment from the others.We construct an auction based model,where each LBS user as a bidder,reports his privacy cost and dummy trajectories.We propose a cost sharing mechanism,which incentives users to report their true cost and the effective degree of privacy for all the users.We also demonstrate that our mechanism satisfies both incentive compatibility and budget balance.3)In data application phase,multi-source data fusion greatly increases the risk of privacy leakage.As one of extensive applications in big data,privacy preserving data publication(PPDP)has been an important research field.One of the fundamental challenges in PPDP is the trade-off problem between privacy and utility of the single and independent data set.However,recent research has shown that the advanced privacy mechanism,i.e.,differential privacy,is vulnerable when multiple data sets are correlated.In this case,the trade-off problem between privacy and utility is evolved into a game problem,in which payoff of each player is dependent on his and his neighbors' privacy parameters.For we firstly present the definition of correlated differential privacy to evaluate the real privacy level of a single data set influenced by the other data sets.Then,we construct a game model of multiple players,who each publishes data set s-anitized by differential privacy.Next,we analyze the existence and uniqueness of the pure Nash Equilibrium.We refer to a notion,i.e.,the price of anarchy,to evaluate efficiency of the pure Nash Equilibrium.4)In big data,even if data collectors provide enough privacy guarantees,privacy leakage still happens.In order to solve the problem,we use cyber-insurance to reduce the loss due to privacy leakage.The correlated privacy risk always increases the risk of insurance companies,because they impede risk pooling.Unfortunately,this situation is worse since cyber-insurance has a negative impact on investment for selfprotection and this phenomenon is referred to as ex ante moral hazard.Lack of reliable actuarial data(e.g.,estimating systematic cyber-risks,attack-loss data)to compute the premium and the indemnity may make users shirk their responsibilities or obtain extra interests by bad behaviors(e.g.,defrauding an accident or exaggerating the loss),which is referred to as ex post moral hazard.We firstly establish a mathematical model based on the traditional insurance theory to solve the above problems.Then,we propose an optimal cyber-insurance contract,which maximizes the expected utility of users.We also propose personalized cyber-insurance contracts to incentivize users to invest in self-protection in no moral hazard and ex ante moral hazard.We derive a Nash Equilibrium of the game between users and insurance companies in ex post moral hazard and propose useful strategies to motive users to take positive behaviors.
Keywords/Search Tags:big data, privacy protection, location-based service, game theory, privacy risk control
PDF Full Text Request
Related items