Font Size: a A A

Research And Implementation Of Differnettial Privacy Protection System For Multidimentsional Data Publishing

Posted on:2018-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:X N WangFull Text:PDF
GTID:2428330518996709Subject:Computer technology
Abstract/Summary:PDF Full Text Request
There are some sensitive data sets such as medical information,consumer data and trajectory data contains a lot of individual privacy.In the process of sharing,communication and analysis of these data,it is easy to cause the leakage of private information,and then made significant influence and trouble to people.There are many problems in the current privacy protection method for multi dimension data,which is mainly reflected in the poor data availability and the poor effect of privacy protection,need to be solved urgently.In order to solve this problem,this thesis makes some research and analysis on the privacy protection of multidimensional data publishing with the idea of differential privacy.Firstly,this thesis proposes a differential privacy protection method of universal multidimensional data.We use the k-means algorithm into the first partition process,and improved the calculating distance method of k-means algorithm,reduced the approximation error.And introduced information entropy,improved kd-Tree algorithm.The experiments show that compared with other methods,the average query time and average query error of our scheme are reduced by 18%and 12.4%respectively.Secondly,this thesis studied and researched the special differential privacy protection of multidimensional sequential data,got obviously improvement on the true positive,false positive,false drop and other indicators.Besides,our thesis relying on strict mathematical reasoning and proof,ensured that improved model are strictly in line with the requirements of differential privacy,to achieve the privacy protection level under a specific privacy budget.Experiments show that the average query error of the improved model is reduced by 22.5%for counter query task,and the true positive improved 16.9%while the false positive and false drop reduced by 66.9%.Finally,we rely on the big data technology realized a system with Spark,which achieved great practical value.This system realizes the release of multidimensional sequential data difference privacy protection,excellent in count query and pattern mining tasks with that the relative error is between 0.01 and 0.02,and the true positive is higher than 91.3%.
Keywords/Search Tags:Differential Privacy, Laplace Mechanism, K-means, KD-Tree, Noisy Prefix Tree, Spark Framework
PDF Full Text Request
Related items