Font Size: a A A

Research On Distributed Association Rules Min-Ing Algorithm And Its Applications

Posted on:2012-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2218330368979592Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology in recent years, in particular the upgrading of database technology, data explosive growth is shown in various fields. In sharp contrast,the valuable knowledge extracted from the data for decision is very scarce.Data mining is a new subject that emerges in the background. The association rule mining is one of the main focus of current Data mining research area which is used to de-termine the relationship among different items or attributes in data set,in order to find de-pendent relations among valuable multiple domains. However, association rule mining is characterized with large computation and concentrated I/O load, On the one hand, associa-tion rules involves huge amounts of data in practical applications, and even if the algorithm is optimized, the time required may be also unacceptable when a serial algorithm for min-ing is used in a single processor; on the other hand, in reality, business data is stored in multiple locations, and all sites need to share information among locations, and incre-mental changes may occur in the data of these sites dynamically. In this case, we must rely on high-performance distributed association rule mining to complete the mining task effec-tively. Frequent itemsets mining is a key step in association rule generation. Its efficiency is a major problem of mining association rules and also a hot spot of research. Based on data sets and the data structures involved in the algorithm, the problem of mining global frequent itemsets is analysed and studied deep so as to further improve the global frequent itemsets mining algorithm from the pruning strategies, network communications strategies and incremental mining methods in the distributed environment. Finally, an example of the application of the algorithms are given in the parper. After the work of this paper is summaried, the content of this paper include as following: (1) A kind of algorithm BFM-MGFIS based on frequent-pattern tree and maximum frequent items mining global frequent items in distributed database is proposed, the algo-rithm introduce subset enumeration tree to relize mining orderly and pruning glob-ally,greatly not only reducing condidate sets,but also promoting parallelism capacity, ex-perimental results show that the algorithm is effective.(2) Discuss maintenance and update rules when the data set changes incrementally, and propose an incremental algorithm distributed for mining global frequent itemsets. the algorithm is based on CanTree prefix tree, making frequent pattern not depend on the fre-quent 1 itemsets, while following a sequence of data items sorted that is specified by the user, all the records are sorted according to the sequence, that has nothing to do with the data set change, and the tree retains all the information from database. Simulation show that the proposed method is effective and feasible.(3) Around the two algorithms proposed for mining association rules in distributed environment, relize simulation system. the two algorithms are applied to real biological data analysis, for the purpose of finding the relationship between wild mushrooms proper-ties and their toxicity.
Keywords/Search Tags:Data Mining, Association Rules, Distributed Mining, Subset Enumeration Tree, Global Frequent Itemsets Set
PDF Full Text Request
Related items