Font Size: a A A

Research On Multi-parameters Perturbation Privacy Preserving Association Rules Mining Algorithm

Posted on:2011-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2178330332960023Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As information technology, network technology, data storage technology and high-performance processor technology advance, the size of data expands quickly, and all these contribute to the generation and the rapid development of the data mining technology. While data mining helps government, enterprises and individuals to get knowledge and rules, and brings benefits to them, it inevitably involves people's privacy. Meanwhile, with the progress of society, people pay more and more attention to privacy, and new difficulties was brought to data mining. In order to get across the gap between data privacy and data mining, privacy preserving data mining emerged and has been developing quickly.Firstly, the basic theory of data mining, association rules mining and the main technology of privacy preserving association rules mining is described in this thesis. Then, one of the classic privacy preserving association rules mining algorithms-MASK algorithm is introduced simply, and multi-parameters perturbation algorithms are studied carefully.Compared with MASK algorithm, multi-parameters perturbation algorithms improve the degree of privacy preserving and data mining accuracy. However, the time-efficiency of restoring the frequent itemsets in multi-parameters perturbation algorithms is still not high, and the problem becomes more and more serious with the increase of itemset. Addressing this issue, multi-parameters randomized perturbation algorithm is dissected carefully. Two methods are proposed in this thesis to improve the time efficiency of multi-parameters randomized perturbation algorithm according to the characteristics of the model to restore frequent itemsets. The first method improves the time efficiency by merging the process of inversing the transformation matrix from two steps into one step. The second method improves the time efficiency further by getting the elements of the first line of the inversed matrix of transformation matrix, while the first method need get all the elements of the inversed matrix of transformation matrix. Finally, both theoretical analysis and experimental results indicate that the first improved algorithm is more efficient than the original algorithm and the second improved algorithm is more efficient than the first improved algorithm. In addition, the second improved algorithm is more space-efficient than the original algorithm. Because the models of restoring frequent itemsets for a variety of multi-parameters perturbation algorithms are same, so the improvements can also be applied to other multi-parameters perturbation algorithms.
Keywords/Search Tags:Data mining, Association rule, Privacy preservation, Multi-parameters perturbation
PDF Full Text Request
Related items