Font Size: a A A

Research Of Mining Algorithm On K-Anonymous Data Sets

Posted on:2015-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:A D ChenFull Text:PDF
GTID:2268330425482051Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the arrival of the era of big data, the Internet is releasing lots of data. No matter the information around enterprise business decision, or personal life consumption habits and so on, are all stored in various forms of data. Big data involve a variety of privacy informations except hiding a lot of political and economic interests. As with the improvement of people’s privacy protection consciousness and k-anonymous privacy protection becomes more and more perfect privacy protection model, this model has gradually become a trusted privacy protection method for individuals and institutions before they published data. K-anonymous data is belonging to a special kind of uncertain data, the generalization values of quasi-identifier attributes for each record can get each certain value at a same possibility, and each tuple has the same generalization value with at least k-1other tuples on the quasi identifier attributes. K-anonymous data achieve the k-anonymous through the generalization tree and has uniform distribution. This characteristic is not conducive to accurate query and the data mining algorithm that already exist can’t be effectively applied to k-anonymous data. Therefore, how to mine k-anonymous data and improve its availability is an urgent problem to be solved.Association rules mining is the most basic method of data mining, which can find the relationship between items or attributes from large amounts of data. At present, many scholars have special study on uncertain data and put forward a lot of excellent uncertain data mining algorithms. However, these algorithms are almost based on that every uncertain tuple has a different probability when the uncertain tuple reductions to a premise tuple, and the k-anonymous data has a uniform distribution feature. The mining process would get a low efficiency or low mining result while use exists mining algorithms. In order to solve this problem, we proposed a new association rules mining algorithm on k-anonymous dataset which combine the advantages of clustering and tree. It includes an extended hierarchical clustering algorithm, k-frequent tree construct algorithm and association rules generation algorithm. They are respectively used for k-anonymous data pre-processing, mining frequent itemsets and generate strong association rules.Data query can effectively improve the availability of data on an other hand. The special characteristic of k-anonymous data makes the left of mining result of association rules is still generalization value, and this makes query algorithm which is already exists can’t be adapt to this variable granularity query and can’t get the corresponding association rules except the original generalization values and confidences. To grasp the uniform distribution characteristics of k-anonymous dataset, we combine the advantage of R tree on spatial query to apply the grain transformation method to the query. A new variable granularity query algorithm on association rules is proposed and based on four key size conversion point. We realized the query result is transparent to users and can effectively meet the different requests of different users.In our experiment, we compare the new algorithms with some traditional data mining algorithm. The result is that we get a considerable improverment in time complexity and mining effect on the processing and mining to k-anonymous dataset, we effectively improve the stability and efficiency of mining process. The experimental results also proves that the algorithms that are proposed in this paper can effectively mine k-anonymous dataset and reach the target that realize the query process is transparent to the user.
Keywords/Search Tags:k-anonymous data, uncertain data, association rule, granularity transformquery, data mining
PDF Full Text Request
Related items