Font Size: a A A

Research On Personal Sensitive Data Privacy Protection

Posted on:2022-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2518306779479024Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
With the rapid development of big data technology,big data applications have been integrated into all walks of life.Big data industry is developing rapidly into a new generation of information technology and service format,that is,collecting,storing,sharing and analyzing the huge number and various personal information data,and finding new knowledge from it,and creating new value.However,personal information data contains a large number of sensitive data,such as home address,identity information,religious belief and health status.Once the sensitive data is leaked,it will cause immeasurable losses,and may threaten personal life safety.Therefore,the privacy disclosure of personal sensitive data is becoming increasingly prominent.The existing data privacy protection schemes cover data collection,data service and data publishing,and most of them use technologies such as k-anonymity,data disturbance,pseudo element and data encryption.However,these schemes protect data privacy at the expense of availability,and can not achieve a good balance between data privacy and utility.So how to enjoy the convenience of big data and protect the data privacy is particularly important.In order to solve this problem,this thesis studies the key technologies in big data security and privacy protection from three aspects: data collection,data service and data publishing.The specific research contents of this paper are as follows:Firstly,in view of the difficulty of balancing privacy and utility in existing privacy protection methods based on data collection,a novel group perturbation model is proposed.In our model,we introduce the edge server as a group perturbation node to disturb the users sensitive data before uploading data to the data acquisition server.Among them,the edge server collects the data of the nearby area,and the data set scale is small.Accordingly,we propose a data perturbation algorithm,which can effectively maintain the statistical results of the data by disturbing the data with global noise,and can ensure low privacy disclosure risk.Theoretical proof and experimental analysis show that our group perturbation mechanism can prevent leakage of data collector,while achieve better utility than the local perturbation.Secondly,aiming at the problem that it is difficult to balance the utility and privacy based on data service,a false query privacy protection mechanism based on optimal location trajectory is proposed.Firstly,from the perspective of information theory,the privacy of trajectory is measured by the mutual information between real trajectory and false trajectory,so as to solve the problem that the privacy of trajectory is difficult to quantify.On this basis,a method of trajectory mutual information calculation based on Markov chain is proposed,which simplifies the calculation process of trajectory mutual information.Secondly,the region is divided by quad tree method,and the trajectory is divided into different segments.Under the relevant constraints,the optimal historical trajectory is found as the false trajectory to ensure that the generated false trajectory is more real and reasonable.Finally,experiments show that the proposed method can guarantee the privacy and utility trade-off of location data to the greatest extent,and it can reduce effectively the system computing cost.Finally,for the existing data publishing privacy protection methods are difficult to balance privacy and utility,a k-anonymity privacy protection mechanism based on optimal clustering is proposed.The optimization problem of k-anonymity mechanism is transformed into the optimal clustering problem of data set by establishing the functional relationship between data distance and information loss.The greedy algorithm and dichotomy mechanism are used to find the optimal clustering which satisfies the k-anonymity constraints,so as to optimize the availability of the k-anonymity model.Theoretical and experimental analysis are given.The experimental results show that the proposed method can reduce the loss of information to the maximum,and it is effective in terms of running time.
Keywords/Search Tags:sensitive information, privacy protection, information loss, group perturbation, mutual information
PDF Full Text Request
Related items