Font Size: a A A

Private-Preserving Distributed Data Mining System

Posted on:2005-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:X C ShenFull Text:PDF
GTID:2168360122981239Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the coming of information era and rapid development of computer network technology, how to mine efficiently knowledge from data under distributed environment becomes a new topic in information science research areas. Association rule mining is an important task of data mining. At present main challenge is in efficiency and memory power. Developing distributed mining algorithms is a better choice. So, in this thesis, we focus on research on distributed mining associations rules. The following is our main research directions: Data can be too large to be loaded into memory at once. Data can be confidential. Customers are willing to provide only the analysis result from data ,not the data themselves. Data can be distributed.The research of distributed data mining is just at its starting stage. Many problems need to be solved. Among them, the system architecture and algorithms of distributed data mining are the most important. This paper makes some interesting exploration in these two directions. Firstly, a distributed data mining system is proposed ,which mines knowledge from large amounts of distributed data sets. Since this system transfers only the intermediate result of local data mining, it greatly decreases the network traffic and enhance the security and privacy of data. The system use CORBA as the distributed software engine, so it does not depend on any particular programming languages, computing platforms. Then, some new ideas and good implementation techniques for distributed data mining algorithms are proposed based on this prototype system. In this paper, we mainly discuss association rule mining and improve the conventional algorithm in two different methods in order to adapt to the distributed/parallel data mining. One is from rules to rules: associationrules are firstly mined at the local sites, and then global association rules are generated from these local rules. The Other is from data to rules: the local sites exchange their intermediate data results, and then global association rules are generated from these results. In this paper, we proposed a new algorithms using the latter methods. With the new algorithms , we can discover frequent item set with minimum support level, without revealing the information of the customers. At last, we draw some conclusions and outline directions for future work.
Keywords/Search Tags:data mining, association rules, distributed, private-preserving
PDF Full Text Request
Related items