Pattern Mining Algorithm On Cloud Computing Platform

Posted on:2016-09-08

Degree:Master

Type:Thesis

Country:China

Candidate:Q F Zhou

Full Text:PDF

GTID:2308330482463410

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, various fields in society are producing more and more data. Nowadays, we are living in a big data time. In order to make data produce effect substantially, we must study how to mine valuable information from a large amount of data efficiently. As a branch of data mining, pattern mining aims to help people find interesting patterns from massive data. Currently it has been used widely and many researchers are focusing on it.Most of traditional pattern mining algorithms are only applicable to run on one single machine, which results in bad mining performance due to factors such as limited physical memory when processing large amounts of data. As a novel computing mode, cloud computing arises and is specially used for processing large data. By using the existing parallel computing models in cloud computing to parallelize mining algorithms, we can utilize large-scale cluster to process data in parallel.MapReduce is an efficient and concise computing model for parallel processing of large datasets. Based on it, many parallel algorithms have been proposed. Spark, which is a more efficient parallel computing model based on memory, makes up the MapReduce deficiency for iterative calculation and is now developing rapidly.This paper firstly introduces some classic mining algorithms. Then we explain the principles of MapReduce and Spark, and put forward a parallel frequent pattern mining algorithm Pamph based on MapReduce and a parallel high utility pattern mining algorithm Phps based on Spark.Pamph adopts hybrid mining strategy which combines breadth first mining with depth first mining, uses a new vertical data format mixset combined with FP-tree structure, and makes use of the parallel processing framework MapReduce to mine frequent patterns. The experiments show that Pamph outperforms the existing algorithm DPC, PFP, and scales better. The Phps algorithm uses Spark RDD model and the variant of HUIMiner algorithm’s data structure UtilityList to mine high utility patterns in parallel. The final experiments evaluate the Phps’ effectiveness.

Keywords/Search Tags:

pattern mining, cloud computing, big data, MapReduce, Spark

PDF Full Text Request

Related items

1	High Frequency And Low Utility Pattern Mining Algorithm And Its Implementation On Cloud Computing
2	Parallel Data Mining Algorithm Research In Cloud
3	Cloud Computing And A Number Of Data Mining Algorithms Mapreduce Research
4	Research And Implementation Of Parallel Data Mining Algorithms Based On Cloud Computing
5	The Process And Research Of Massive Data Mining Based On Cloud Computing
6	Research Of Large-scale Data Mining Technologies On MapReduce
7	Research On Cluster Analysis Of Biomedical Patent Data In Yunnan Province Based On Spark Cloud Computing Architecture
8	Study On Parallel Alogrithm Of Large-scale Numerical Calculation In Cloud Computing Environment
9	The Research Of Sandstorm Meteorological Data Mining Based-on Cloud Computing And SVM
10	Performance Optimization And Applications Of MapReduce In Cloud Computing