Font Size: a A A

The Study And Implementation Of Several Distributed Algorithms For Mining Assocaition Rules

Posted on:2010-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:D J NiFull Text:PDF
GTID:2178360275999083Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the coming of information era and rapid development of computer network technology, the distributed environment has become increasingly popular. However, the traditional centralized data mining technology can not complete the distributed mining task, how to mine efficient knowledge from data under distributed environment has become a hot topic in artificial intelligence field. Association rules mining is an important task of data mining. At present, main challenge is in efficiency, memory power and the redundant issue. Developing distributed mining algorithms is a better choice. So, in this thesis, we focus on research on distributed mining association rules, and propose several efficient distributed algorithms. The following is our main research works:Firstly, Referred to characteristics of mining association rules, two categories of mechanism of distributed mining association rules, synchronously mining and asynchronously mining are presented.Secondly, analysis of current distributed topology and an efficient distributed mining association rules synchronously algorithm NDMA under net-distributed topology and another efficient distributed mining association rules asynchronous algorithm SMDA under star-distributed topology are proposed. A number of techniques are used in the algorithms to improve their speed and memory usage, such as using Hashing to specify polling sites, building prefix-tree on each site and globally pruning candidate sets, using sampling technology when locally mining and making best use of inductive learning and derivation technique when integration of pattern etc. These techniques can not only produce smaller candidate sets, but also reduce the communication cost to O(n). Experimental results show that MDMA improves FDM by 60%. Many techniques are also used in the algorithms to improve mining accuracy, such as proposing candidate frequent pattern set to solve the support undervaluation problem, and using negative border and dynamic reduction of support to reduce errors caused by sampling, and defining two type of errors to evaluate the mining results. Error I is discarding true value and Error II is remaining false value. Experimental results show that the Error I is 1.6% and the Error II is 4.6% when sampling rate is 25% using SDMA algorithm.Thirdly, to solve the redundancy problem exited in distributed mining, a new distributed non-redundant association rules asynchronous algorithm DGNRR is presented. It adopts closed pattern mining instead of pattern mining on the local site. The core techniques include: define the transmission and integration format for the closed pattern. Design two types of integration of rules according to the source site of patterns. Analysis of the characteristics of non-redundant association rules, and a way of generating the non-redundant association rules from closed pattern is proposed. Experimental results show the feasibility of the algorithm under three kinds of data set.Fourthly, a distributed data mining system prototype called DDMine is designed, it is designed using the Enterprise JavaBeans component architecture, and suitable for the enterprises' data mining. In the end, some thoughts of designing distributed algorithms for association rules are proposed.
Keywords/Search Tags:distributed mining association rules, distributed topology, closed pattern, non-redundant association rules, EJB
PDF Full Text Request
Related items