Research On Privacy Preserving Methods For Data Mining

Posted on:2009-07-11

Degree:Doctor

Type:Dissertation

Country:China

Candidate:F Li

Full Text:PDF

GTID:1118360275454615

Subject:Communication and Information System

Abstract/Summary:

Researches on data mining technologies have greatly promoted the development of automatic data analysis and prediction. Current data mining technologies, including exploring analysis, descriptive and predictive modeling, pattern and rule finding, content analysis, have been applied to various government services, business and scientific research activities. Accessing original data is the prerequisite for applying data mining, while it is generally considered private with its holder. Direct access to these data will violate privacy. With the emergence of policies and laws about information privacy protection, privacy issues come to be one major obstacle for implementing data mining.Privacy preserving data mining methods are defined to apply the data mining works effectively through technical methods while ensuring the precision and accuracy, without private data leakage. Many researches have been undergoing on popular data mining methods, such as classification, clustering, and association rule mining. However, the effectiveness of privacy preserving and the coupling with the environment are fundamental issues to be resolved for privacy preserving data mining. This paper, based on these issues, studies the secure evaluation of privacy preserving and its coupling with the environment.Firstly, this paper gives an overview of current development of privacy preserving data mining methods from data distribution, data mining type, protection technology. Also, comparison and inductive results are given based on that.Data perturbation method is a main privacy preserving data mining method in centralized environment, in which additive randomization is a representative technology. This paper defines a matrix model on the additive randomization method. With singular vector decomposition technique (SVD), current method shows vulnerability on SVD attack. It will also make the original privacy evaluation method unavailable. This paper proposes a new privacy evaluation model according to the problem, and brings a improved randomization method based on the evaluation model, through threshold function projection. It shows robustness of the improved methods in SVD attack experiments. Randomization is a general data perturbation method. Research work on evaluation and improvement is universal.Distributed environment is an area of rapid growth in data mining applications. However, traditional centralized privacy preserving data mining methods can not be directly applied to distributed environment, as it contains more complexities and secure issues. This paper gives a privacy-level classification for distribute data mining, and defines correlated privacy constraints. A new method, multiparty secure statistical method, and k anonymous exchange protocol are proposed according to the constraints. A distributed data perturbation based privacy preserving method is given based on the method and protocol, which applies the centralized additive randomization technique directly to distributed environment. Several collusion and malicious models are defined to validate to security and privacy of the method. Experiments and analysis show that the method is high robust and privacy secure under semi-honest environment. This part of work directly applies traditional centralized randomization and reconstruction technology to distributed environment, letting it become a general framework for realizing data perturbation in distributed environment.Calculation on Euclidean space is one of the fundamental algorithms of data mining. This paper studies the security of distributed privacy preserving data mining methods that based on Euclidean space. These methods show the existing vulnerability under collusion attack. Meanwhile, the paper raises triple party and multiparty secure distance comparison protocols covered by Euclidean space with homomorphic cryptography. Multiparty secure distance comparison protocol is then optimized through preprocessing, parallel computation and protocol merging. This protocol is finally applied to form fully distributed privacy preserving k-mean clustering. It is proved the method is secure under semi-honest environment and the optimization works efficiently. In addition, secure distance comparison protocol can support other Euclidean space based data mining methods, such as k-nearest-neighbor, k-mean etc. Thus it becomes general in classification, clustering, web mining areas.Finally, a prospect is given beside the conclusion of the research work. The prospect consider the following elements for further study: signal process based randomization, universal evaluation method on randomization methods, general distributed anonymous data computation method, universal secure evaluation method on collusion attack under semi-honest environment, optimization on iterated cryptography based computation, and so on.

Keywords/Search Tags:

Data Mining, Privacy Preserving, Distributed Modeling, Privacy Preserving Clustering, Privacy Evaluation, Secure Distance Comparison

Related items

1	Research On The Technologies Of Information Mining And Privacy Preserving In Distributed Environment
2	Research Of Privacy-Preserving Clustering Algorithm Over Distributed Data
3	Research On Distributed Data Mining Based On Privacy Preserving
4	Research On Key Technologies Of Privacy Preserving Data Mining Based On Local Differential Privacy
5	Research On Vertically Partitioned Data Oriented Privacy Preserving Data Mining Algorithm
6	Research On Some Key Technologies Of Privacy Preserving Data Mining
7	Privacy-preserving Research Of The Distributed Clustering Algorithm Based On Density
8	Study On Privacy Preserving Classification Data Mining
9	Analyzing And Researching Based On Privacy-Preserving Clustering
10	The Research On Privacy Preserving Data Mining