Font Size: a A A

Design Of Frequent Pattern Mining Algorithm LPS-Miner And Research On Parallel Formulations

Posted on:2010-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:H L LiuFull Text:PDF
GTID:2178360275496074Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Mining frequent patterns from large database is a basic problem of data mining field, it is also an unfailing and well developed hot problem. Because of the large and multi-dimensional features of the real application data, the study about association rules mining algorithms with high performance and good scalability has been a research focus all the time. The contents of this thesis base on the problems that design a high performance frequent pattern mining algorithm and transplant it to the parallel computing environment. The main work includes:1. Proposed and implemented a frequent pattern mining algorithm: LPS-MinerThe thesis thoroughly studied the classical frequent pattern mining algorithms, and proposed an algorithm LPS-Miner which is suitable for frequent pattern mining on relatively large database. Two novel data structures LPS-FP-Tree and LPS-FP-Forest were proposed in the algorithm to present the transaction database. LPS-FP-Tree eliminated the parent-child pointer to compress the node of the tree, which also dramatically decreased the cost of accessing trees. LPS-Miner applied the divide-and-conquer strategy greatly, which make each processing tree small and independent with each other. Moreover, mining single frequent branch non-recursively, high-performance memory management, I/O optimization and pruning, all of these technologies' applying guaranteed the performance of LPS-Miner. The experiments result showed that LPS-Miner achieved good time efficiency on both dense and sparse dataset.2. Research on distributed memory based parallel LPS-Miner algorithmIn the thesis, we designed the parallel LPS-Miner algorithm for the distributed memory parallel computing system. Because the sequential LPS-Miner applied divide-and-conquer strategy greatly and LPS-FP-Trees and mining results are all independent with each, the parallelization of LPS-Miner is relatively easy and effective. The thesis analyzed and optimized the problems that affect the performance of the parallel algorithm, which included data partition, communication and load balance. All of them made the algorithm achieve high performance and scalability on frequent pattern mining of large database.
Keywords/Search Tags:data mining, association rules mining, frequent pattern mining, parallel association rules mining
PDF Full Text Request
Related items