Font Size: a A A

Privacy Preserving Association Rule Mining

Posted on:2007-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2178360185495749Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining technology has emerged as a means of extracting potential patterns or knowledge from large quantities of data and is becoming widely used in many fields, such as scientific research,medical research and business. However, data mining with its nature to efficiently discover valuable, non-obvious patterns and rules from massive data, is particularly vulnerable to misuse or abuse.Privacy preserving in data mining has been considered a vital factor as for how to obtain globally valid data mining results without revealing any unnecessary information of the sites.Therefore,privacy and security has become the focus of many data mining researches.The paper first introduces and analyzes some typical privacy preserving association rules algorithms from data distribution,data modification, hiding objects and privacy preserving technology dimensions.Then, the pager presents two algorithms about privacy preserving association rules mining.(1) Taking preserving privacy of the data mining target as the original dataset and summarizing the advantage and disadvantage of Rizvid's MASK, the paper introduces a Boolean Rule Mining algorithm-DMASK, which is based on Multi-factor Random Perturbation. Compared with MASK, DMASK can provide corresponding settings of perturbation factors according to user's requirements of privacy. Therefore, it reduces the possibility of breaching privacy. Meanwhile, with proper factors, the accuracy and privacy preserving are both achieved. In addition, by using set theory, we can optimize the algorithm, control the variation of dataset density, and eliminate redundant counting overhead caused by the perturbation. As a result, the efficiency of execution is improved significantly. Algorithm DMASK has been executed in IBM Synthetic dataset and BMS-WebView-1 dataset. Compared with APoRIOR, DMASK has less than 4 times execution time. Meanwhile, it can guarantee 70+% of privacy preserving and 90+% of accuracy.(2)Taking privacy preserving of the owner of the original dataset as the sensitive patterns and responding to Forward-Inference Attack drawback of the SWA algorithm presented by Oliverira, the paper introduces a new algorithm. Firstly, based on the relationship between sensitive patterns and non-sensitive patterns, a perturbation matrix is established. Then, by setting the entries to appropriate values and multiplying the original transaction dataset with the perturbation matrix, a perturbed dataset which can prevent Forward-Inference Attack is created. Moreover, we utilize some different perturbation factors to avoid the recovery of sensitive patterns, and to reduce the probability of hiding non-sensitive patterns. Finally, by experiment, we...
Keywords/Search Tags:data mining, association rule, privacy preserving, sensitive pattern, randomly perturbation, support, confidence, perturbation matrix
PDF Full Text Request
Related items