Font Size: a A A

Research Of Parallel Association Rules Algorithm Based On Hadoop

Posted on:2016-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y BiFull Text:PDF
GTID:2308330473964427Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the era of the Big Data, as the significant assets of enterprises and public organizations, data is changing the concept of enterprise assets and the development progress historically. As an important research area and technology of data mining, using association rules can find some characters and the interdependent relationship from large scale data. And by extracting all kinds of unkown and useful information from these intangible data asset, enterprises can get more and more tangible benefits and even modify their development strategy and business model. When using traditional association rules algorithm, an important research direction of data mining, to deal with the large databases, many problems like more I/O operation and large computation and so on are happened. With the fully development of cloud computing platform Hadoop, the combined utilization between the association rules algorithm and distributed computing framework is growing trend.Based on a better understanding of the basic concepts of association rules and classic algorithm, this thesis improves the existing serial association rules algorithm through introducing the concept of frequency set tree and modifying the usage of the matrix and name the new one as R-SLI. In addition, using the direct parallelization strategy, this thesis designs P-MT algorithm that implements R-SLI running parallelly on MapReduce framework. Finally, program the algorithm and explore the algorithm performance in different experiment datasets and different threshold value. Through the analysis of experiment results, it shows that this algorithm has higher performance.
Keywords/Search Tags:Data Mining, Association Rules, MapReduce, Matrix, Frequency Tree Set
PDF Full Text Request
Related items