Font Size: a A A

The Research Of Quantitative Association Rules Data Mining Based On Hadoop

Posted on:2017-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ChengFull Text:PDF
GTID:2348330485481688Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the big data era,not only data become larger and more varied,the data dimension is also growing.It is a development trend of information society that digging out valuable information from the data which is massive,multi-type and multi-dimensional.But it is difficult to finish the task in a limited time for the traditional machine learning algorithms on the basis of the mixed data which is massive,multi-type and multi-dimensional.So we must seek new method to solve this problem.At present,the massive data mining technology based on cloud computing has been universally recognized by the industry and academia.And the data mining technology which is based on the Apache Hadoop cloud computing platform also has become one of the hot technology of common concern between industry and academia.Based on the research of data mining theory and Hadoop distributed technology,at the same time using the MapReduce distributed computing model.This paper selects mix multi-dimensional data which including type and numeric as the research data,the association rules and clustering analysis as the research object.Implementing the data mining algorithm research based on the Hadoop cloud computing platform.The mainly completed the following several aspects:1)For type and numeric mix multi-dimensional data,proposing a data preprocessing framework based on Hadoop.And implementing the data preprocessing method and the whole data processing.2)To study the traditional Apriori algorithm and the existing parallel Apriori algorithm.In order to make up for the disadvantages of the MRARM algorithm which is low efficiency of handling massive and mix multi-dimensional data.The paper puts forward a multi-dimensional association rules algorithm based on Hadoop—MDApriori algorithm.The improved algorithm not only overcomes the bottleneck of the traditional Apriori algorithm which need to repeat scanning database,and greatly reduces the time overhead of generating k-candidate itemsets by generating all k-candidate itemsets one-time as globle variables.So as to improve the efficiency of the algorithm.3)In order to get association rules which are intuition,generality and easy for people to use,it has carried on the cluster analysis to the correlation results.The paper puts forward Parallel K-means Algorithm Based on Attribute Information Entropy—PK-meansAIE algorithm.The algorithm can not only has a good summarize classified for a large number of association rules,but avoided the problem of falling into the local optimal solution easily because of selecting the initial clustering center unreasonable and the volatility clustering results.Finally,building the Hadoop distributed platforms within a local area network.And having an ecomparison and analysis of scalability,speedup and standard efficiency combining the bridge monitoring data for the improved algorithm of MDApriori and PK-meansAIE.The experimental results show that the improved algorithms have good scalability and parallel processing advantages on the basis of realizing the goal of traditional data mining algorithm.
Keywords/Search Tags:Hadoop, association rules, data mining, mix multi-dimensional data, Apriori algorithm
PDF Full Text Request
Related items