Privacy Preserving Association Rule Mining

Posted on:2007-04-18

Degree:Master

Type:Thesis

Country:China

Candidate:Y Chen

Full Text:PDF

GTID:2178360185495749

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Data mining technology has emerged as a means of extracting potential patterns or knowledge from large quantities of data and is becoming widely used in many fields, such as scientific research,medical research and business. However, data mining with its nature to efficiently discover valuable, non-obvious patterns and rules from massive data, is particularly vulnerable to misuse or abuse.Privacy preserving in data mining has been considered a vital factor as for how to obtain globally valid data mining results without revealing any unnecessary information of the sites.Therefore,privacy and security has become the focus of many data mining researches.The paper first introduces and analyzes some typical privacy preserving association rules algorithms from data distribution,data modification, hiding objects and privacy preserving technology dimensions.Then, the pager presents two algorithms about privacy preserving association rules mining.(1) Taking preserving privacy of the data mining target as the original dataset and summarizing the advantage and disadvantage of Rizvid's MASK, the paper introduces a Boolean Rule Mining algorithm-DMASK, which is based on Multi-factor Random Perturbation. Compared with MASK, DMASK can provide corresponding settings of perturbation factors according to user's requirements of privacy. Therefore, it reduces the possibility of breaching privacy. Meanwhile, with proper factors, the accuracy and privacy preserving are both achieved. In addition, by using set theory, we can optimize the algorithm, control the variation of dataset density, and eliminate redundant counting overhead caused by the perturbation. As a result, the efficiency of execution is improved significantly. Algorithm DMASK has been executed in IBM Synthetic dataset and BMS-WebView-1 dataset. Compared with APoRIOR, DMASK has less than 4 times execution time. Meanwhile, it can guarantee 70+% of privacy preserving and 90+% of accuracy.(2)Taking privacy preserving of the owner of the original dataset as the sensitive patterns and responding to Forward-Inference Attack drawback of the SWA algorithm presented by Oliverira, the paper introduces a new algorithm. Firstly, based on the relationship between sensitive patterns and non-sensitive patterns, a perturbation matrix is established. Then, by setting the entries to appropriate values and multiplying the original transaction dataset with the perturbation matrix, a perturbed dataset which can prevent Forward-Inference Attack is created. Moreover, we utilize some different perturbation factors to avoid the recovery of sensitive patterns, and to reduce the probability of hiding non-sensitive patterns. Finally, by experiment, we...

Keywords/Search Tags:

data mining, association rule, privacy preserving, sensitive pattern, randomly perturbation, support, confidence, perturbation matrix

PDF Full Text Request

Related items

1	Research Of Privacy Preserving Data Mining Based On Perturbation
2	Research On Multi-parameters Perturbation Privacy Preserving Association Rules Mining Algorithm
3	A Study Of Privacy-Preserving Data Mining Based On Multiplicative Perturbation
4	Research On Algorithm For Privacy Preserving Based On Association Rule Mining
5	Research On Privacy-Preserve Data Mining For Protecting Association Rule And Original Datasets
6	Research And Design On Privacy-Preserving Data Mining Approaches And Algorithms
7	Privacy Preserving In Association Rule Mining
8	Research On Privacy Preserving Classification Data Mining
9	Research On Data Mining Privacy Preserving Method Based On Random Perturbation
10	Research And Application On Association Rule Mining