Font Size: a A A

Privacy-preserving collaborative data mining

Posted on:2007-09-04Degree:Ph.DType:Dissertation
University:University of Ottawa (Canada)Candidate:Zhan, Zhijun JustinFull Text:PDF
GTID:1458390005482954Subject:Computer Science
Abstract/Summary:
Data mining is a process to extract useful knowledge from large amounts of data. To conduct data mining, we often need to collect data. However, sometimes the data are distributed among various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. How multiple parties collaboratively conduct data mining without breaching data privacy presents a grand challenge. Theoretical results from the area of secure multi-party computation show that one may provide secure protocols for any multi-party computation with honest majority. However, the general methods are far from efficient and practical for computing complex functions on inputs consisting of large sets of data. Therefore, to efficiently tackle the problem that is formulated as Privacy-Preserving Collaborative Data Mining, we need to develop privacy-preserving solutions with adequate efficiency. The goal of this dissertation is to provide efficient solutions to the problem of knowledge extraction among multiple parties involved in a data mining task, without disclosing the data between the parties. The distributed data models considered are the vertical collaboration where diverse features of the same set of data are collected by different parties, and the horizontal collaboration where diverse sets of data, all sharing the same features, are gathered by different parties. We develop privacy-preserving protocols for multiple parties to conduct the desired computations. Specifically, we provide solutions for some common data mining algorithms including privacy-preserving association rule mining, privacy-preserving sequential pattern mining, privacy-preserving naive Bayesian classification, privacy-preserving decision tree classification, privacy-preserving k-nearest neighbor classification, privacy-preserving support vector machine classification, and privacy-preserving k-medoids clustering. Our goal is to provide efficient solutions to obtain accurate data mining results and minimize private data disclosure. The solutions are distributed, i.e., there is no central, trusted party having access to all the data. Instead, we define protocols using homomorphic encryption and digital envelope techniques to exchange the data while keeping it private.
Keywords/Search Tags:Data mining, Collaboration where diverse, Provide efficient solutions, Parties
Related items