Font Size: a A A

A Research Of PPDM In The Medical Information System

Posted on:2017-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y J BaiFull Text:PDF
GTID:2308330485486457Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the increasing use of data mining technologies in medical data, privacy leaks of electronic medical records is causing serious trouble for patients. Thus privacy preserving data mining(PPDM) becomes a major concern for people.Association rule hiding(ARH) is a new subfield of PPDM. Different from the traditional technologies which hide the sensitive information itself, ARH technologies hide the sensitive patterns generated in the process of data mining. Much more attention are being paid to the ARH technologies by researchers. This paper mainly introduces the ARH algorithms, and improves the hiding accuracy and computational efficiency of the current algorithms.Heuristic hiding approaches is a class of ARH, which has rapid computing speed, but big side-effect for the original dataset. This paper improves the current heiristic hiding approaches by combining application scenarios of medical information system, and proposes a heurisitc bi-clustering(HBC) algorithm. HBC uses bi-clustering to hide sensitive rules which have similar patterns at the same time, thus hiding accuracy is maxmized and side-effect is minimized. HBC also uses formulas to calculate the minimal iteration to hide a itemset, thus unnecessacy operations are reduced and computational efficiency is improved. Experiments on the real datasets are performed to prove that HBC performs better on the hiding accuracy and side-effects minimization.Exact hiding approaches is a class of ARH. which has high hiding accuracy, low side-effcets, but high computational complexity. Exact hiding approaches usually convert the hiding problem into an constraint satisfaction problem(CSP), and then solve the CSP. And the main part of the running time is spent to solve the CSP. This paper proposed a binary implicit enumeration(BIE) algorithm by analytical studying the application scenarios of medical information system. BIE first performs decomposition on the coefficient matrix of CSP to decrease the computing scale of CSP, then it can probably find the optimal solution in a short time by sorting the variants and using binary implicit enumeration. Experiments on the real datasets are performed to prove that there is a good chance that BIE can quickly find the optimal solution than the unimproved algorithm.At last, we proposes a new architecture for the application of ARH in the medical information system, and three data characters, namely data provider, data collector and data user. And experiments of two algorithms on the real medical datasets are performed.
Keywords/Search Tags:ARH, medical information system, PPDM
PDF Full Text Request
Related items