Font Size: a A A

The Application Research Of Association Rules Parallel Algorithm Analysis Based On FP-Growth

Posted on:2012-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:T LiFull Text:PDF
GTID:2218330368984596Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of modern technology and advances in computer science and technology. Data mining has made rapid development as a new knowledge discovery technology. Data Mining is to discovery something that is interesting, useful, implicit, previously unknown and potentially useful patterns or knowledge from large, incomplete, noisy, fuzzy, random large-scale data. As we have accumulated a large amount of data in daily affairs and the study of science. If there is not useful tool to discovery potentially useful information in them, even if they were the ocean of knowledge, the information we have get is poor. We also will not be able to find this useful information. There are many disadvantages in the previous algorithms. For example the classical algorithm for aprior algorithm need to scan the data many times, and form a large number of candidate items, but the FP-Growth algorithm needs to build a large FP-Tree that will hold too much memory. The efficiency is not good, when we face a large of data base, the algorithm in efficiency is not enough, which also is not fit for large data base to data mining, so the problem of efficiency has become the focus and difficulty.FP-Growth algorithm take the strategy that divide the large data base to be little frame to deal with them respectively, we make the message of association rules in data base to compress into a FP-tree, and keep the message of association rules in items. Then divide FP-tree into condition pattern library to mining respectively. Because the way does not need to scan the data base many times, and not form candidate itemsets, so the efficiency is more than Aprior Algorithm, but because the algorithm need to construct a large FP-tree, and the FP-tree need to hold much memory, so it is not fit for the large database.Parallel Algorithm is to divide a large task into some smaller task, and distribute them to different processors, they carry on the task in different process by cooperation, by the way we will accelerate the speed of process, and we also will improve the scale of the problem.The text study the problem is The Application Research of Association Rules Parallel Algorithm Analysis Based on FP-Growth, by studying the FP-Growth and Parallel Algorithm in deep, and propose an algorithm that is Parallel Algorithm Based On FP-Growth.According to the shortcoming and no enough of past algorithm, so I proposed a parallel algorithm bases on FP-Growth, the algorithm is by making the data base divided into some little frame to cope with them respectively. And by splitting FP-tree that keep the association rule no changing, I do deep study in task distribution, load balancing, multiprocessor scheduling, make them to get further optimization, the way fit for large data base, the efficiency has get further improve that compare with the previous algorithm.
Keywords/Search Tags:FP-Growth, Association Rules, parallel Algorithm, Load Balancing
PDF Full Text Request
Related items