Privacy-preserving collaborative data mining

Posted on:2007-09-04

Degree:Ph.D

Type:Dissertation

University:University of Ottawa (Canada)

Candidate:Zhan, Zhijun Justin

Full Text:PDF

GTID:1458390005482954

Subject:Computer Science

Abstract/Summary:

Data mining is a process to extract useful knowledge from large amounts of data. To conduct data mining, we often need to collect data. However, sometimes the data are distributed among various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. How multiple parties collaboratively conduct data mining without breaching data privacy presents a grand challenge. Theoretical results from the area of secure multi-party computation show that one may provide secure protocols for any multi-party computation with honest majority. However, the general methods are far from efficient and practical for computing complex functions on inputs consisting of large sets of data. Therefore, to efficiently tackle the problem that is formulated as Privacy-Preserving Collaborative Data Mining, we need to develop privacy-preserving solutions with adequate efficiency. The goal of this dissertation is to provide efficient solutions to the problem of knowledge extraction among multiple parties involved in a data mining task, without disclosing the data between the parties. The distributed data models considered are the vertical collaboration where diverse features of the same set of data are collected by different parties, and the horizontal collaboration where diverse sets of data, all sharing the same features, are gathered by different parties. We develop privacy-preserving protocols for multiple parties to conduct the desired computations. Specifically, we provide solutions for some common data mining algorithms including privacy-preserving association rule mining, privacy-preserving sequential pattern mining, privacy-preserving naive Bayesian classification, privacy-preserving decision tree classification, privacy-preserving k-nearest neighbor classification, privacy-preserving support vector machine classification, and privacy-preserving k-medoids clustering. Our goal is to provide efficient solutions to obtain accurate data mining results and minimize private data disclosure. The solutions are distributed, i.e., there is no central, trusted party having access to all the data. Instead, we define protocols using homomorphic encryption and digital envelope techniques to exchange the data while keeping it private.

Keywords/Search Tags:

Data mining, Collaboration where diverse, Provide efficient solutions, Parties

Related items

1	Data mining techniques applied to medical information: Multiple solutions to support decision making
2	Pattern-Based Data Mining on Diverse Multimedia and Time Series Data
3	Research On Task Collaboration Based On Multi-Agent In Data Mining System
4	Efficient Algorithms for High Dimensional Data Mining
5	TCM Data Mining Platform And The Services
6	Analysis of collaboration intelligence and semantic interoperability to provide design guidelines for interfacing design and engineering in automotive design process
7	WebFrame: In pursuit of computationally and cognitively efficient Web mining
8	A Study Of Online Collaboration Based On Wikipedia
9	A Utility-Aware Privacy Preserving Framework For Distributed Data Mining With Worst Case Privacy Guarantee
10	Privacy-Preserving Data Mining