Research And Implementation Of Mining Algorithm For Association Rules In Big Data Based On Hadoop

Posted on:2016-11-14

Degree:Master

Type:Thesis

Country:China

Candidate:J G Liao

Full Text:PDF

GTID:2308330479993918

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

In recent years, with the explosion growth of data volume, how to mine valuable information of big data has been widespread attention. Data mining technology is currently important technical means to solve this problem. By mining the frequent item sets in datasets to derive association rules is an important content of data mining technology. However, with the advent of the era of big data, the traditional data mining algorithms can not adapt to the characteristics of big data, therefore studying and proposing a new data mining algorithm to adapt to the big data environment have become very urgent and need.This article in-depth analyzes and researches the current domestic and foreign big data mining algorithms, we put forward an effective and fast algorithm for mining association rules in big data, expect to solve the problem of speed slow when facing big data.In this thesis, the main work can be summarized as the following respects:(1) The current status and existing problems of data mining technology are analyzed and studied. The contradiction happened in increasing the amount of data and people’s desire for valuable information, and between the growth of the amount of data and the current hardware development speed is the current big data environment necessary to solve the main contradiction. Big data does not make people reduce the speed of data mining, on the contrary, people hope to be able to get a quick and accurate method to dig out valuable information of big data.(2) Analyze the current domestic and foreign data mining algorithms, distributed computing framework-Hadoop and distributed computing model-Map Reduce. Apriori searches the database for many times, did a lot of I/O overhead, although FPGrowth uses FPTree tree structure to compress the original database, but during the iteration subtree structure is too much, It can’t conducive to the process of data mining. Hadoop reduces the difficulty of distributed programming, and easy to manage, at the same time the Map Reduce is very suitable for association rules mining, so, Hadoop and Map Reduce have cerntain advantages for mining association rules in big data environment.(3) Study Pre Post algorithm and its improved algorithm is given. Pre Post algorithm combines the advantage of the FPGrowth algorithm and vertical mining algorithm, but it uses the way similar to the Apriori algorithm to get frequent items. Although the merging two N-list is linear time complexity, but if K-frequent itemsets has S, then the algorithm needs to compare(S*(S-1))/2 times, this makes the time overhead to be reckoned. And mining K+1 itemsets must save all the K-frequent itemsets in memory, this is likely to exceed the memory capacity. Therefore, this thesis proposes a bottom-up depth-first strategy to improve the Pre Post algorithm.(4) Put forward a novel big data-mining algorithm based on Hadoop platform called MRPre Post, it to some extent compensates for the flaw of data mining algorithm under big data environment. A major factor affecting the performance of parallel algorithm is cluster load. In order to improve the MRPre Post algorithm performance, the thesis proposes a grouping strategy to ensure cluster load balance. Experiment shows that MRPre Post algorithm can adapt to big data association rules mining.

Keywords/Search Tags:

Big Data, Data Mining, Association rules, Hadoop, PrePost, MRPrePost

PDF Full Text Request

Related items

1	Research On Algorithm And Application Of Big Data Association Rules Mining Based On Hadoop
2	The Research Of Quantitative Association Rules Data Mining Based On Hadoop
3	Research And Application On Association Rules Mining Algorithm Base On Hadoop
4	The Research And Implementation Of Algorithm For Mining Association Rules Based On BigData
5	Mining Association Rules Algorithm Analysis Based On Hadoop
6	Research And Implementation Of Mining Association Rules For EMU Failure Data Based On Hadoop
7	Research On Parallel Association Rules Algorithm Based On HADOOP Platform
8	Research On Algorithm Of Mining Association Rules Based On Hadoop
9	A Survey Of Mining Association Rules Algorithm In Big Data
10	Research On The Apriori Algorithms For Meteorological Data Association Rules Analysis Based On Cloud Computing