Font Size: a A A

Research On Privacy Protection In Medical Data Publishing And Sharing

Posted on:2021-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:H R CaoFull Text:PDF
GTID:2404330611984020Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data and the rapid development of medical informatization,the sharing and publishing of medical data has attracted the attention of governments of various countries and has become the focus and key in the informatization construction.However,it exists a great risk of privacy disclosure in the process of sharing and publishing medical data.It is an important research topic in the medical data application field to protect the privacy of patients and publish and share medical data effectively.In order to meet the needs of privacy protection and data availability in medical data publishing and sharing,the research work of this paper is as follows:Firstly,analyze the risk of patient privacy disclosure during the process of medical data release and sharing deeply.Construct a privacy protection data release and sharing framework based on the blockchain.And design the process of privacy-protected medical data publishing and sharing based on the blockchain,which among medical institutions and departments between medical institutions.The characteristics and advantages of the proposed framework are analyzed,and the sharing requirements of tabular medical data and statistical medical data are discussed.Secondly,aiming at the data sharing part of the proposed framework,put forward a privacy protection method suitable for table data sharing.And in order to solve the problem that potential attackers may obtain patient privacy information through similarity attacks and background knowledge attacks,the study defining high-sensitivity attribute groups and limiting the frequency of high-sensitivity attribute values in equivalent classes,combined on the semantic similarity tree,propose the(?,?,k)-Anonymous model,design and implement the sharing algorithm.First,the records of the table are divided by the semantic similarity tree,so that the sensitive attributes of the same category can be grouped together.And according to the set sensitivity ? of the sensitive attribute to constraint,so that the data records in the equivalent class satisfy the K anonymity,and the highly sensitive attribute values in the divided equivalent classes are constrained by parameters ? thresholds,make the distribution of sensitive attribute values in the equivalence class uniform,preventing the similarity attacks and background knowledge attacks.Finally,generalize the quasi-identifiers in the table data by K-anonymity algorithm and then share them.The experimental comparison between the(?,?,k)-anonymous model and the(?,k)-anonymous method through the Adult dataset verifies that the method presented in this paper has lower information loss and reduces the risk of privacy information leakage.Thirdly,aiming at the data publishing part of the proposed framework,considering the inherent relationship between attributes,a random forest-based differential privacy protection method suitable for statistical data publishing is proposed(RF-based DP).First,the sensitive attribute columns in the data set are identified separately by the random forest algorithm.Realize the allocation of privacy budget parameters based on the recognition accuracy rate and the sensitivity of attributes.Then use the differential privacy noise mechanism to process the statistical data to ensure the privacy.An experimental comparison with the classical differential privacy algorithm based on the Laplace mechanism prove that the proposed algorithm reduces the error in the published data.
Keywords/Search Tags:medical data, publishing and sharing, blockchain, random forest, differential privacy, anonymity
PDF Full Text Request
Related items