Font Size: a A A

Association Rule Mining On Cloud Computing Platform

Posted on:2016-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y S WuFull Text:PDF
GTID:2298330467476535Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid advance of information technology, in particular the improvement of computer hardware performance, the volume of data that we collect, store and transfer has been growing exponentially. To discover the useful information and knowledge from the very large data is a challenging task, which is the main purpose of data mining. To enable the data mining applications, high performance mining algorithms and stable software platforms are vital.Traditional mining algorithms are usually designed to work sequentially, and mainly focus on saving memory cost, whose performances are bounded by the computing resources of a single machine. As the big data era comes, traditional mining technology can no longer scale well for the increasing mining requests and data volume.Cloud computing arises to address the challenge, which delivers computing as a service rather than a product. Using cloud computing in data mining can overcome the shortcomings of traditional mining technology and improve the mining performance. Apache Hadoop is an open-source software framework for storage and large-scale processing of datasets on clusters of commodity hardware. It is basically composed of a distributed file system called HDFS and a novel programming model called MapReduce. It has become a standard platform for cloud computing in industry application and academic research.This paper first reviews traditional association mining algorithms and their MapReduce-based parallel variants, in particular that of Apriori and FP-growth, and discusses the issues and challenges with these algorithms. To address the issues, this paper proposes a new MapReduce-based parallel algorithm, Peclat, derived from Eclat with two versions, one of which works in breadth-first manner and the other works in depth-first manner. Peclat presents a mixed vertical mining strategy that opportunistically selects the smaller-sized vertical format in mining process, which could save both memory and time in intermediate computation. Peclat also introduces additional pruning and dynamic re-ordering to further improve the mining efficiency. Comprehensive experimental evaluation demonstrates that our Peclat algorithm outperforms the prior algorithms and the strategies proposed in this paper is effective.
Keywords/Search Tags:data mining, association rule, cloud computing, Hadoop, MapReduce
PDF Full Text Request
Related items