Font Size: a A A

The Research On The Distributed Algorithm Of Mining Frequent Closed Patterns

Posted on:2012-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2178330341450165Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Mining association rules is one of the most important problems in data mining, whichcould describe the potential relationships between items in the magnanimous data. The miningof association rules focuses on the frequent patterns. Because of the complexity of frequentpatterns, mining frequent closed patterns have been proposed to improve the miningefficiency. The set of frequent closed patterns is far smaller than the set of frequent patternson scale. The set of frequent closed patterns still contain enough information of the frequentpatterns and its accurate support. People have made many achievements in the research offrequent closed patterns on a single processor. But as the distributed environment has becomemore common and the traditional serial algorithms can not solve the mining problems underdistributed one, it is very important to desigh the high-performance distributed miningalgorithms.This thesis analyzes the performance of typical algorithms of association rules, and theirvirtues and disadvantages. For the shortages of the traditional algorithms, two algorithmsbased on distributed for mining frequent closed patterns are presented. The major workincludes the following two parts::In the first part, in view of the characteristic of mining association rules and distributedenvironment, one efficient algorithm(PFCI_Miner) based on distributed for mining frequentclosed patterns is presented. The algorithm uses the Master-Slave structure to implement taskdistribution. The Master-processor assigns the task efficiently by sending Prefix PathTable(PrePthx) which is presented in the paper, and the Slave-processors mine local frequentclosed patterns with the help of the proposed store tree(Trac-tree). Finally the main processorfinds out the global frequent closed patterns. The algorithm uses star-like topology in order tomake all data communications only between the Master-processor and the Slave-processors. There is no communication and synchronization among all Slave-processors. Theexperimental results show the efficiency of the PFCI_Miner.In the second part, according to the features of data streams, a distributed algorithmDSFC_Miner for mining frequent closed patterns from data streams is proposed. In addition,the method, in which data streams are partitioned, is adopted in the algorithm. The algorithmgets critical frequent closed patterns from each data stream section, and creates correspondinglocal FCI_DS tree to store the critical frequent closed patterns. Though introducing error, thepresent global frequent closed patterns can effectively be mined. The experimental resultssuggest that DSFC_Miner algorithm is fast and effective...
Keywords/Search Tags:Data Mining, Association Rules, DistributedAlgorithm, Frequent Closed Patterns
PDF Full Text Request
Related items