Font Size: a A A

Research On Privacy Preserving Clustering Mining Algorithm Based On Sampling Technique

Posted on:2008-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:F L LiuFull Text:PDF
GTID:2178360245478341Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of data analysis and processing technique, the privacy leak problem about individual or company is inevitably exposed when releasing or sharing data to mine useful decision information and knowledge, then give the birth to the research field on privacy preserving data mining and become the focus of the researcher home and abroad in recent three years. Clustering in data mining is one of the important methods to analyze management problem, such as market segmentation, customer classification and manufacturing system module design and so on. In order to obtain these results, it involves a large number of detailed sensitive information. At the same time, the potential models and patterns in database may bring threat to privacy and information security. Therefore, with the rapid development of personalized demand of customers, privacy preserving clustering algorithm becomes the key issue of privacy preserving data mining problem which needs to be solved urgently.At present, privacy preserving clustering algorithm has just begun, and privacy preserving technique adopted is very simple. Furthermore the efficiency and the effectiveness of the privacy preserving clustering algorithms have contradictions. Based on such scenario, this paper presents a sampling-based privacy preserving clustering algorithm, in the premise of ensuring the data privacy and the accuracy of cluster results, it also can process database with a quantity data. The contributions of this dissertation are as follows: according to the theory of density-based clustering algorithm and the model-based clustering algorithm which can construct clustering distribution function; three distribution functions have been constructed. They are the uniform sampling model, the Gaussian sampling model and the mixture Gaussian sampling model. This paper also proves the equivalence between additive fuzzy system and Gaussian mixture model and determines the optimal parameters of the clustering distribution function can be estimated by fuzzy c means clustering results. Then it can produce new data which have the clustering characteristic of the original data and also can protect the privacy by the application of random sampling technique. And then give the detailed description of this algorithm process. At last, it tests the validity of the new algorithm by experiment simulation and gives the advantages and the suitable setting to apply each algorithm.
Keywords/Search Tags:Data Mining, Privacy Preserving, Clustering Analysis, Sampling
PDF Full Text Request
Related items