Font Size: a A A

Pattern Mining Algorithm On Cloud Computing Platform

Posted on:2016-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:Q F ZhouFull Text:PDF
GTID:2308330482463410Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, various fields in society are producing more and more data. Nowadays, we are living in a big data time. In order to make data produce effect substantially, we must study how to mine valuable information from a large amount of data efficiently. As a branch of data mining, pattern mining aims to help people find interesting patterns from massive data. Currently it has been used widely and many researchers are focusing on it.Most of traditional pattern mining algorithms are only applicable to run on one single machine, which results in bad mining performance due to factors such as limited physical memory when processing large amounts of data. As a novel computing mode, cloud computing arises and is specially used for processing large data. By using the existing parallel computing models in cloud computing to parallelize mining algorithms, we can utilize large-scale cluster to process data in parallel.MapReduce is an efficient and concise computing model for parallel processing of large datasets. Based on it, many parallel algorithms have been proposed. Spark, which is a more efficient parallel computing model based on memory, makes up the MapReduce deficiency for iterative calculation and is now developing rapidly.This paper firstly introduces some classic mining algorithms. Then we explain the principles of MapReduce and Spark, and put forward a parallel frequent pattern mining algorithm Pamph based on MapReduce and a parallel high utility pattern mining algorithm Phps based on Spark.Pamph adopts hybrid mining strategy which combines breadth first mining with depth first mining, uses a new vertical data format mixset combined with FP-tree structure, and makes use of the parallel processing framework MapReduce to mine frequent patterns. The experiments show that Pamph outperforms the existing algorithm DPC, PFP, and scales better. The Phps algorithm uses Spark RDD model and the variant of HUIMiner algorithm’s data structure UtilityList to mine high utility patterns in parallel. The final experiments evaluate the Phps’ effectiveness.
Keywords/Search Tags:pattern mining, cloud computing, big data, MapReduce, Spark
PDF Full Text Request
Related items