Font Size: a A A

Research On Privacy Preserving Data Mining And Knowledge Discovery

Posted on:2008-03-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q H HuangFull Text:PDF
GTID:1118360218960570Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
The goal of data mining and knowledge discovery is to mine important knowledge and rules from large amounts of data. With the development of computer technology and data collecting methods, the sources of data set increased. In the mean time the research on protecting the privacy of data holder and the private knowledge become the emergent topic. This paper proposed a set of novel methods of privacy preserving data mining and knowledge discovery technologies.Mining association rules and sequential patterns mining are two key methods in large amounts data mining. Recently privacy, legal and commercial concerns restrict centralized access to this data. The general methods are far too inefficient and impractical in application. This paper introduced homomorphic encryption to database mining. Novel methods of privacy preserving association rules mining, sequential patterns mining were proposed, based on two-party and multi-party computation, respectively.Data perturbation is to alter original data to protect the security of data. This paper proposed a privacy preserving method by merely adding fake data. We discussed the probability of perturbation effect on support counting and reconstructed the support before perturbing. Then we designed experiments to decide the parameters of perturbation.Collaborative filtering collects people' s evaluation on common interesting items to provide prediction of unknown item. This thesis proposed collaborative filtering based on multi-agent. We make the full use of knowledge collected by the system and apply C-Means clustering technique to get agents to take the place of neighbours. As further step we designed privacy preserving collaborative filtering protocols to protect the security of computation process.The recent advances in hardware and software have enabled the capture of different measurement of data in a wide range of fields. These are typical of a new kind of data: the data streams. Examples include sensor networks, web logs, and computer network traffic. The research in data stream mining has gained a high attraction due to the importance of its applications and the increasing generation of streaming information. Applications of data stream analysis can vary from critical scientific and astronomical applications to important business and financial ones. In this thesis we propose an algorithm based on sequence bitmap representation and tilted-time window technique for mining approximate sequential patterns in data stream. The experiment result proved that our proposal is able to mine sequential patterns at low expense. As a further step we applied our multi-party secure computation method to design algorithm of privacy preserving sequential patterns mining on data stream.The study on data stream is one of the host topics among the database circle. The related mining research are focused on classification, frequent patterns mining. This thesis proposed algorithm of mining sequential patterns on data stream. We designed LSP-tree structure to generalize the online stream data and to mine its sequential patterns using bitmap algorithm. As further step we applied our multi-party secure computation method to design privacy preserving sequential patterns mining on data stream.The original points in this paper are listed below.Novel privacy preserving algorithms based on homomorphic encryption were proposed in databases mining. Combining with existing database mining technology the privacy preserving association rules mining and sequential patters mining algorithms were designed. Also the two-party and multi-party computation problem were considered carefully and wel-implemented.By using a new perturbation method, the privacy preserving sequential patterns mining algorithm was designed. The original sequential patterns support estimation could be reconstructed from the altered database. Thus the privacy of original data was protected.A novel multi-agent technique is proposed to generate recommendations for customers in a more accurate way. Based on C-Means clustering, recommendation agents were generated. Thus the drawbacks of neighbours-based recommendation method was reduced. Based on this technique the privacy preserving collaborative filtering algorithm was brought forward.The problem of mining sequential patterns in data stream is proposed. The memory-based online synopsis LSP-tree structure was designed. By applying bitmap method and tilted-windows technique, the stream sequential patterns mining algorithm was proposed. As a further step we are the first to propose the problem of securely mining sequential patterns in data stream and it's related algorithms.
Keywords/Search Tags:Privacy Preserving, Data Mining and Knowledge Discovery, Collaborative Filtering, Database, Datastream
PDF Full Text Request
Related items