Font Size: a A A

Research On Differential Privacy Protection Technology For Data Publication

Posted on:2019-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:B XuFull Text:PDF
GTID:2428330545450696Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of big data and data science,applications based on data sharing have begun to spread widely in almost all social fields and commercial fields.However,the leakage of data privacy brought about by data sharing hinders the healthy development of data sharing services.Data sharing is mainly conducted by some large data owners(such as medical research ins titutes,Facebook,Twitter,etc.)regularly publishing some large data sets for other third-party research institutions or individuals to use for data analysis,but these large data sets it often contians some of the user's private information.Therefore,how to protect users' privacy information from malicious intrusions is a problem that needs to be solved.This paper studies the privacy protection of data publishing in the process of genome-wide association research(GWAS),and designs and implements a data acquisition and analysis platform based on differential privacy.The main work includes the following two aspects.(1)A differential privacy model based on GWAS data is proposed.The paper proposes a distance-based differential privacy proposal to publish all meaningful SNPs and protect the privacy of data provides.First,the algorithm calcuates the distance scores of all SNPs,and then add Laplace noise on the distance scores,at last,publish meaningful SNPs based on the statistical significance threshol d.Strict theoretical analysis proves that this proposal could satisfy differential privacy.Finally,the paper does a simulation experiment using real data sets.The experimental results show that,with the same data set,the proposal we proposed has high er data utility than the existing differential privacy proposals.(2)Based on the aforementioned differential privacy protection model,a data acquisition and analysis platform based on differential privacy is designed and implemented.The purpose is to apply it to the release and analysis of GWAS data to provide doctors with certain genetic disease diagnosis needs..However,it is difficult to obtain personal genotype data anytime anywhere.In this paper,the use of data from the Internet is considered for simulation.The system has three main components,namely data acquisition module,differential privacy processing module and data analysis module.The data collection part is mainly responsible for crawling the data in the network,and then preprocessing the data,including processing the missing values of some fields in the data.Then,the processed data is subjected to differential privacy processing and finally stored in the database.The data analysis part is to provide a unified query operation interface for the data analyst so that it can perform statistical query and analysis on the data stored in the database and display the analysis results in the form of charts.
Keywords/Search Tags:Data Publication, Differential Privacy, GWAS
PDF Full Text Request
Related items