The Research Of Quantitative Association Rules Data Mining Based On Hadoop

Posted on:2017-11-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Cheng

Full Text:PDF

GTID:2348330485481688

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the advent of the big data era,not only data become larger and more varied,the data dimension is also growing.It is a development trend of information society that digging out valuable information from the data which is massive,multi-type and multi-dimensional.But it is difficult to finish the task in a limited time for the traditional machine learning algorithms on the basis of the mixed data which is massive,multi-type and multi-dimensional.So we must seek new method to solve this problem.At present,the massive data mining technology based on cloud computing has been universally recognized by the industry and academia.And the data mining technology which is based on the Apache Hadoop cloud computing platform also has become one of the hot technology of common concern between industry and academia.Based on the research of data mining theory and Hadoop distributed technology,at the same time using the MapReduce distributed computing model.This paper selects mix multi-dimensional data which including type and numeric as the research data,the association rules and clustering analysis as the research object.Implementing the data mining algorithm research based on the Hadoop cloud computing platform.The mainly completed the following several aspects:1)For type and numeric mix multi-dimensional data,proposing a data preprocessing framework based on Hadoop.And implementing the data preprocessing method and the whole data processing.2)To study the traditional Apriori algorithm and the existing parallel Apriori algorithm.In order to make up for the disadvantages of the MRARM algorithm which is low efficiency of handling massive and mix multi-dimensional data.The paper puts forward a multi-dimensional association rules algorithm based on Hadoop�MDApriori algorithm.The improved algorithm not only overcomes the bottleneck of the traditional Apriori algorithm which need to repeat scanning database,and greatly reduces the time overhead of generating k-candidate itemsets by generating all k-candidate itemsets one-time as globle variables.So as to improve the efficiency of the algorithm.3)In order to get association rules which are intuition,generality and easy for people to use,it has carried on the cluster analysis to the correlation results.The paper puts forward Parallel K-means Algorithm Based on Attribute Information Entropy�PK-meansAIE algorithm.The algorithm can not only has a good summarize classified for a large number of association rules,but avoided the problem of falling into the local optimal solution easily because of selecting the initial clustering center unreasonable and the volatility clustering results.Finally,building the Hadoop distributed platforms within a local area network.And having an ecomparison and analysis of scalability,speedup and standard efficiency combining the bridge monitoring data for the improved algorithm of MDApriori and PK-meansAIE.The experimental results show that the improved algorithms have good scalability and parallel processing advantages on the basis of realizing the goal of traditional data mining algorithm.

Keywords/Search Tags:

Hadoop, association rules, data mining, mix multi-dimensional data, Apriori algorithm

PDF Full Text Request

Related items

1	Research On The Apriori Algorithms For Meteorological Data Association Rules Analysis Based On Cloud Computing
2	Mining Association Rules Algorithm Analysis Based On Hadoop
3	Research On Association Rules Algorithm Based On Hadoop
4	Research On A Parallel Data Mining Algorithm Apriori
5	Research On Association Rules Mining Methods Of Mass Engineering Data Based On Hadoop
6	The Research And Implementation Of Parallel Association Rules Algorithm Based On Cloud Environment Data Mining
7	The Study On The Recommending Methods For Online Travel Websites Association Rules
8	Multi-dimensional Multi-layer Data Mining Algorithm Mpfp Design And Its Application
9	Algorithm Based On Association Rules In Data Mining Research And Application
10	Research And Application Of Multidimensional Association Rules Mining Algorithm Based On Hadoop