Association Rule Mining On Cloud Computing Platform

Posted on:2016-02-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y S Wu

Full Text:PDF

GTID:2298330467476535

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

With the rapid advance of information technology, in particular the improvement of computer hardware performance, the volume of data that we collect, store and transfer has been growing exponentially. To discover the useful information and knowledge from the very large data is a challenging task, which is the main purpose of data mining. To enable the data mining applications, high performance mining algorithms and stable software platforms are vital.Traditional mining algorithms are usually designed to work sequentially, and mainly focus on saving memory cost, whose performances are bounded by the computing resources of a single machine. As the big data era comes, traditional mining technology can no longer scale well for the increasing mining requests and data volume.Cloud computing arises to address the challenge, which delivers computing as a service rather than a product. Using cloud computing in data mining can overcome the shortcomings of traditional mining technology and improve the mining performance. Apache Hadoop is an open-source software framework for storage and large-scale processing of datasets on clusters of commodity hardware. It is basically composed of a distributed file system called HDFS and a novel programming model called MapReduce. It has become a standard platform for cloud computing in industry application and academic research.This paper first reviews traditional association mining algorithms and their MapReduce-based parallel variants, in particular that of Apriori and FP-growth, and discusses the issues and challenges with these algorithms. To address the issues, this paper proposes a new MapReduce-based parallel algorithm, Peclat, derived from Eclat with two versions, one of which works in breadth-first manner and the other works in depth-first manner. Peclat presents a mixed vertical mining strategy that opportunistically selects the smaller-sized vertical format in mining process, which could save both memory and time in intermediate computation. Peclat also introduces additional pruning and dynamic re-ordering to further improve the mining efficiency. Comprehensive experimental evaluation demonstrates that our Peclat algorithm outperforms the prior algorithms and the strategies proposed in this paper is effective.

Keywords/Search Tags:

data mining, association rule, cloud computing, Hadoop, MapReduce

PDF Full Text Request

Related items

1	The Parallel Association Rules Algorithm Based On Mapreduce In The Application Of Community Analysis Research
2	The Research And Implementation Of Parallel Association Rules Algorithm Based On Cloud Environment Data Mining
3	Parallel Association Rules Algorithm Based On Hadoop
4	The Process And Research Of Massive Data Mining Based On Cloud Computing
5	Data Mining Based On Hadoop Platform
6	Studies And Applications Of Association Rule Mining Methods In Data Mining
7	Research On A Distributed Weighted Association Rule Mining Algorithm Base On Hadoop
8	Research On Massive Digital Image Data Mining Based On Hadoop Cloud Platform
9	Parallel Data Mining Algorithm Research In Cloud
10	Data Mining Association Algorithm Research And Realization Based On Cloud Computing