A Survey Of Mining Association Rules Algorithm In Big Data

Posted on:2017-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:D W Wang

Full Text:PDF

GTID:2308330482982431

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, information technology and cloud computing, we have entered the era of big data now, and it pushes big data to a technology change in IT. Whether it is in the field of mobile communications, e-commerce, Internet of things or others, it will generate a huge amount of data every day. It is the research direction of association rules to extract the implied useful information and knowledge from the data which is large, different structure, and mixed with a lot of noise. It is a problem to be solved to select what kind of platform or tools for data mining analysis, make out the nature of the data and then find business opportunities.A new form of association rules is defined to solve the extraction problem which implicates the association rule. Aiming at the problem of how to extract effective association rules, it applies the implicating strength as the measure of new rules to extract the rules that truly have the implicating relationship, and it can analyze the the positive and negative of the rules antecedent and consequent. The heuristic information is introduced to make the association rules extraction more targeted, so that the rules in which users are not interested are avoided. Experimental conclusion shows that the association rules form and algorithm are effective and efficient.In the era of big data, as the PF-tree created by the FP-Growth algorithm can not be loaded into memory one time, it will influence the efficiency of FP-Growth algorithm in large extent. Then the OPFP-Growth algorithm is proposed in this paper. Traditional FP-Growth algorithm is optimized through MapReduce in the Hadoop cluster. For the initializatin load imbalance and the frequent itemsets reduction of the MapReduce parallel algorithm, it introduces methods of weighted round robin load balancing and frequent closed itemsets to balance the data distribution and processing capabilities of data nodes and reduce the iterative output redundancy of intermediate results of FP-tree in the iterative process. At the same time, using the Hive to adjust the data storage structure improve the utility rate of space of HDFS. The effectiveness and efficiency of the algorithm is verified by the experiments.The OPFP-Growth algorithm is applied to analyze the meteorological data correlation factor using the NCDC meteorological data in the experiments, so that it can analyze the relationship of various factors in the meteorological information to provide the decision support for weather forecast, disaster prevention and mitigation.

Keywords/Search Tags:

association rules, big data, hadoop, hive, OPFP-Growth, meteorological analysis

PDF Full Text Request

Related items

1	Research On The Apriori Algorithms For Meteorological Data Association Rules Analysis Based On Cloud Computing
2	Mining Association Rules Algorithm Analysis Based On Hadoop
3	Research On Association Rules Algorithm Based On Hadoop
4	Research On Supermarket Goods Association And Recommendation Based On Hadoop
5	Discovery Of Association Rules Based On Meteorological Data
6	Research On Parallel FP-growth Association Rules
7	Research On Association Rules Mining Methods Of Mass Engineering Data Based On Hadoop
8	Research On Hadoop-based MeteCloud Resource Storage And Data Processing
9	Research On Algorithm And Application Of Big Data Association Rules Mining Based On Hadoop
10	Research On Parallel Association Rules Algorithm Based On HADOOP Platform