Font Size: a A A

Research On The Key Technologies Of Privacy Preserving Data Publishing

Posted on:2019-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y X N GaoFull Text:PDF
GTID:2348330545484491Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
"Health Chinese 2030" outline points out that the supply of medical service model innovation accelerate the construction of wise medical platform,to meet the needs of data sharing,decision support,scientific research,strategic planning.Which is an important strategic plan in our country at present.Moreover,the data release problem,especially the privacy protection in medical data release,has significant research value.This paper mainly studies the link attack and the probabilistic attack,and proposes the PCA-GRA K anonymous algorithm and DPFSOD algorithm based on differential privacy.Which aims to release data not only meeting the safety requirements,but also improving the data utility as far as possible.Based on the open source medical data set Heart Disease dataset and Cardiotocography dataset on the machine learning repository of University of California,Irvine,this paper studies the data publishing scenarios of link attack and probabilistic attack,the main contents are:(1)When facing with link attack,not only the global algorithm but also the local algorithm divide all quasi identifiers and sensitive information into two independent parts apart,without considering the relationship between the two parts to design the privacy preserving data publishing algorithm.The PCA-GRA K anonymous algorithm is based on traditional K algorithm and regards the relationship between the identifiers and the sensitive attributes as the control standard to dominate the degree of generalization,to meet the demand of data security and effectively improve the data utility after release.(2)In order to prevent the attacker from carrying out the probability attack in the data publishing scenario,this paper proposes a DPFSOD algorithm based on the DPLloyd algorithm.This algorithm can improve the data utility of released data as much as possible under the premise of satisfying the requirement of differential privacy model by using feature selection,outlier detection and similarity measurement between attributes based on balanced security and data utility.In addition,the upper bound of privacy budget in DPFSOD algorithm is derived,and the range of acceptable noise in security sense is given in this paper,which provides a theoretical reference for further application of differential privacy.
Keywords/Search Tags:Privacy Preserving, K Anonymity, Differential Privacy, Data Utility
PDF Full Text Request
Related items