Font Size: a A A

Research On Association Mining Algorithms With Dynamic Database Based On Big Data

Posted on:2016-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:W F ZhuFull Text:PDF
GTID:2308330461459924Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is a process to find potential data rules and characteristics of information by data calculation. Data mining algorithms under big data should not only consider the correctness of algorithms, but also meet the calculation feasibility. Taking the dynamic database as the research object, this paper mainly studies how to realize the parallel association rule mining under big data and solve problems of algorithm efficiency, fairness and incremental updating mining in data association mining.Association rule mining is to find potential correlation of data, and describe relativity between data. An often-quoted example is that beer and diapers tend to be purchased together in the U.S. Association rule mining algorithms has gone through development with classes of Apriori algorithm, FP-Growth algorithm, distributed algorithm and made great progress in algorithms’efficiency and applicability. The distributed algorithms realize data dispersion in a distributed storage and a divide-and-conquer method with parallel calculation. These algorithms have great advantages in efficiency and expansibility. However, current distributed algorithms of association rule mining still have yet to form a distributed solution with features of flexible scheduling and balanced allocation. Besides, under the background of big data, size of datasets are incremental updating and data mining algorithms have significant performance difference between static database and dynamic database. The current study has not yet been put forward an effective distributed mining solution in dynamic database.In this paper, we have investigated the association rule mining algorithms. By contrast analysis on the core ideas and applicable situations of Apriori, FP-Growth, FUP, PFP and other typical algorithms and making an intense research for distributed algorithms of data mining under big data, this paper proposes a load-balanced distributed calculation solution and an incremental updating mining solution. The model with suffix-based pattern and balanced task group makes it possible that each node handles local mining tasks separately. Based on the incremental updating mechanism of full FP-tree by tree merging which can avoid scanning datasets again, association rules of dynamic data can be extracted.Based on Hadoop experimental environment, contrast experiment shows that the HBFP(High Balanced FP-Growth) can improve the degree of evenness in each of parallel computing nodes under big data. Execution time standard deviations between nodes have been reduced. And the global execution time effectively is reduced by 12%. Besides, the incremental updating solution IHBFP(Incremental updating High Balanced FP-Growth) can effectively reduce history database scanning times by full FP-tree and only work on the branches which features are changed. The modified algorithm achieves a certain degree of stable improvement in calculation efficiency.
Keywords/Search Tags:Big data, Dynamic database, Association rule mining, Parallel computing
PDF Full Text Request
Related items