Font Size: a A A

Research Of Mining Algorithm Of K-Anonymous Data Sets Based On Generalization Tree

Posted on:2014-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LiuFull Text:PDF
GTID:2248330395980920Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
At present, most of the data are released in the form of k-anonymous. K-anonymous privacy model is also increasingly used in various fields. K-anonymous data is a special kind of uncertain data. The probability of its tuple generalized to each instance of possibility world is equal, and the quasi identifiers of k tuples are the same. It will be difficult for the external attacker to attack the data by the appearance of the connection. So the k-anonymous privacy protection model can protect the user’s privacy. Also because of its specificity, the availability of such data is greatly reduced; even using the optimal k-anonymous algorithm cannot be able to produce a satisfactory full accuracy data. Therefore, we not only to make a fuss over k-anonymous algorithm, but to find a type of method mining such kind of data to improve the availability of the k-anonymous data.Data provenance has described the process of data generated and changed. It is applied to data mining, data verification, data recovery and references, and some other areas. K-anonymous data is derived by the determining value according to the relevant generalization tree. As a consequence, k-anonymous data provenance contains generalization tree and the derived rules. It specifies process of the evolution of the static data source (i.e. the original table) through a specific generalization tree and finally to a k-anonymous table. Based on the analysis of the generalization of k-anonymous data, each k-anonymous table is derived by original data table from a given generalization tree. The formal definition of the generalization is presented. On this basis, analyzed from the perspective of a recipient of the data, the construction algorithm of the generalization tree is proposed. It can be more convenient and effective for the recipient to do data mining analysis work. Association rule mining algorithm is a basic and important method of data mining. It aims at finding interesting link among the itemsets in large amount of data. Now there are some research results by many researchers on association rule mining algorithms of uncertain data and some of the algorithms are excellent. However, these algorithms are proposed on the basis of unequal probability of instances of possibility world restored from tuples. It doesn’t apply to such special uncertain data k-anonymous data. In order to solve this problem, provenance of k-anonymous data is applied to mining and k-anonymous data mining algorithm is presented—association rule mining algorithm based on generalization tree. It includes the expected support algorithm and confidence algorithm of k-itemsets. The former is used to find frequent itemsets while the latter is used to produce a strong association rule.Compared with the traditional certain or uncertain data association rule mining algorithm, the newly presented algorithm has a great improvement in the time complexity and efficiency of mining when dealing with k-anonymous data. The experimental results show that the algorithm proposed in this paper is an effective management method of k-anonymous data sets.
Keywords/Search Tags:k-anonymous data, uncertain data, data provenance, generalization tree, association rule
PDF Full Text Request
Related items