Font Size: a A A

Research And Application Of Multidimensional Data Constructing And Association Rules Mining Algorithm Based On Mapreduce

Posted on:2014-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y MaFull Text:PDF
GTID:2268330401981220Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the arrival of Internet Big Data Time, the processing of vast amounts of data hasbecome the technology bottleneck in many fields. While a series of cloud computingtechnologies like MapReduce, provide excellent solutions for such problems. More andmore Internet applications choose to combine with cloud computing technologies, toenhance their service scalability and processing capability, and deal with the pressures andchallenges brought about by the huge amounts of data.The main contents of this paper, based on the detailed analysis of the characteristics ofmultidimensional data, MapReduce distributed computing model and Hadoop distributedarchitecture, propose a method of parallelly constructing multidimensional data, and anefficient parallel multidimensional association rules mining algorithm, which is a typicalapplication of multidimensional data.This paper first introduces the basic concepts of multidimensional data, theformalization description and related applications, as well as the definition of associationrules, classification and data mining process, and multidimensional association rulesmining details. Then, on the basis of the analysis of MapReduce model principles andcharacteristics, propose a method of parallelly constructing multidimensional data based onMapReduce; By analyzing the characteristics and limitations of the various classicassociation rules mining algorithms, propose a method of parallelly miningmultidimensional association rules based on Apriori algorithm. Finally, evaluate theperformance of the algorithm through simulation experiments, tune and optimize theMapReduce model data flow. The experimental results show that, compared withstand-alone execution, the method of parallelly constructing multidimensional data is moreefficient, has better stability; The parallel multidimensional association rules miningmethod improves the efficiency, meanwhile reduces the times of data files scanning,greatly reduces the system’s I/O load.
Keywords/Search Tags:MapReduce, Hadoop, Parallel, Multidimensional Data, Association Rules
PDF Full Text Request
Related items