Font Size: a A A

A Survey Of Mining Association Rules Algorithm In Big Data

Posted on:2017-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:D W WangFull Text:PDF
GTID:2308330482982431Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, information technology and cloud computing, we have entered the era of big data now, and it pushes big data to a technology change in IT. Whether it is in the field of mobile communications, e-commerce, Internet of things or others, it will generate a huge amount of data every day. It is the research direction of association rules to extract the implied useful information and knowledge from the data which is large, different structure, and mixed with a lot of noise. It is a problem to be solved to select what kind of platform or tools for data mining analysis, make out the nature of the data and then find business opportunities.A new form of association rules is defined to solve the extraction problem which implicates the association rule. Aiming at the problem of how to extract effective association rules, it applies the implicating strength as the measure of new rules to extract the rules that truly have the implicating relationship, and it can analyze the the positive and negative of the rules antecedent and consequent. The heuristic information is introduced to make the association rules extraction more targeted, so that the rules in which users are not interested are avoided. Experimental conclusion shows that the association rules form and algorithm are effective and efficient.In the era of big data, as the PF-tree created by the FP-Growth algorithm can not be loaded into memory one time, it will influence the efficiency of FP-Growth algorithm in large extent. Then the OPFP-Growth algorithm is proposed in this paper. Traditional FP-Growth algorithm is optimized through MapReduce in the Hadoop cluster. For the initializatin load imbalance and the frequent itemsets reduction of the MapReduce parallel algorithm, it introduces methods of weighted round robin load balancing and frequent closed itemsets to balance the data distribution and processing capabilities of data nodes and reduce the iterative output redundancy of intermediate results of FP-tree in the iterative process. At the same time, using the Hive to adjust the data storage structure improve the utility rate of space of HDFS. The effectiveness and efficiency of the algorithm is verified by the experiments.The OPFP-Growth algorithm is applied to analyze the meteorological data correlation factor using the NCDC meteorological data in the experiments, so that it can analyze the relationship of various factors in the meteorological information to provide the decision support for weather forecast, disaster prevention and mitigation.
Keywords/Search Tags:association rules, big data, hadoop, hive, OPFP-Growth, meteorological analysis
PDF Full Text Request
Related items