Research On Privacy Preserving Clustering Mining Algorithm Based On Sampling Technique

Posted on:2008-12-25

Degree:Master

Type:Thesis

Country:China

Candidate:F L Liu

Full Text:PDF

GTID:2178360245478341

Subject:Management Science and Engineering

Abstract/Summary:

With the development of data analysis and processing technique, the privacy leak problem about individual or company is inevitably exposed when releasing or sharing data to mine useful decision information and knowledge, then give the birth to the research field on privacy preserving data mining and become the focus of the researcher home and abroad in recent three years. Clustering in data mining is one of the important methods to analyze management problem, such as market segmentation, customer classification and manufacturing system module design and so on. In order to obtain these results, it involves a large number of detailed sensitive information. At the same time, the potential models and patterns in database may bring threat to privacy and information security. Therefore, with the rapid development of personalized demand of customers, privacy preserving clustering algorithm becomes the key issue of privacy preserving data mining problem which needs to be solved urgently.At present, privacy preserving clustering algorithm has just begun, and privacy preserving technique adopted is very simple. Furthermore the efficiency and the effectiveness of the privacy preserving clustering algorithms have contradictions. Based on such scenario, this paper presents a sampling-based privacy preserving clustering algorithm, in the premise of ensuring the data privacy and the accuracy of cluster results, it also can process database with a quantity data. The contributions of this dissertation are as follows: according to the theory of density-based clustering algorithm and the model-based clustering algorithm which can construct clustering distribution function; three distribution functions have been constructed. They are the uniform sampling model, the Gaussian sampling model and the mixture Gaussian sampling model. This paper also proves the equivalence between additive fuzzy system and Gaussian mixture model and determines the optimal parameters of the clustering distribution function can be estimated by fuzzy c means clustering results. Then it can produce new data which have the clustering characteristic of the original data and also can protect the privacy by the application of random sampling technique. And then give the detailed description of this algorithm process. At last, it tests the validity of the new algorithm by experiment simulation and gives the advantages and the suitable setting to apply each algorithm.

Keywords/Search Tags:

Data Mining, Privacy Preserving, Clustering Analysis, Sampling

Related items

1	Research On Privacy Preserving Methods For Data Mining
2	Research On Vertically Partitioned Data Oriented Privacy Preserving Data Mining Algorithm
3	Research On Privacy Preserving Clustering Mining Method
4	The Research Of The Grid Based Privacy Preserving Clustering Algorithm
5	Research And Implement Of Privacy-preserving Scheme Based On Data Mining
6	Data Privacy Preserving Approach Research For Clustering Analysis
7	Research On Key Technologies Of Privacy Preserving Data Mining Based On Local Differential Privacy
8	Research On K-medoids Clustering Algorithm Under Privacy Protection Model
9	Research And Design On Privacy-Preserving Data Mining Approaches And Algorithms
10	Research Of Privacy-Preserving Clustering Algorithm Over Distributed Data