Research On SPARK Based Massive Data Frequent Pattern Mining Algorithms

Posted on:2017-01-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y D Zhao

Full Text:PDF

GTID:2308330503487183

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Frequent pattern mining aims to find contents those often appear in data sets. It is one of the most important research directions. According to different data sets, there are two kinds of frequent pattern, frequent itemset and frequent subsequenc e. Because mining frequent patterns costs a lot of computing resources and data sets is getting larger and larger, people must use distributed computing frameworks to guarantee effectiveness. The first part of this paper focuses on mining frequent itemsets in transaction data sets, and research frequent itemset mining algorithms based on distributed computing framework Spark. We first design and implement the Spark versions of classic algorithms, Apriori and FP-Grwoth. And then we propose a two phases frequent itemsets mining algorithm based on Spark which has both FPGrowth’s and Apriori’s features. Through some experiments, we find the advantages and disadvantages of these algorithms, and sum up their applicable scenes. These algorithms can make full use of the resources of clusters and address the needs for mining frequent itemsets on large data sets rapidly. What’s more, this part also introduces how to use the ideas of mine frequent itemsets to mine frequent subsequences in sequence data sets on Spark.Besides the work of mining frequent patterns on Spark, in order to mine frequent patterns in numeric time series data sets, the second part of this paper focuses on time series compression. Compressing time series reduces not only the amount of data but also nosies. The decrease of noises will make the trends of time series much clearer and also will be good for digging out significative frequent patterns. Starting from perceptual important points(PIPs), by extending former work, we design and implement two PIP based time series compression algorithms, compression algorithm based on global PIPs and compression algorithm based on local PIPs. The two algorithms apply to different kinds of time series. And we measure the effectiveness and distortion degree of two algorithms through experiments. Visualization is an important demand when using time series. Because compression algorithms based on PIPs can keep the trend information of time series, they have excellent visualization.

Keywords/Search Tags:

frequent patterns, Spark, time series compression, perceptual improtant point

PDF Full Text Request

Related items

1	A Research And Application Of Mining Frequent Patterns Based On Time Series
2	Hierarchical Clustering Algorithm For Mining Frequent Patterns And Time-series Flow
3	Design And Implementation Of Frequent Browsing Pattern Mining System Based On Spark
4	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
5	Study On Algorithms For Frequent Patterns Mining Based On Memory Indexing
6	The Techniques Research On Frequent Pattern Mining
7	Research On Frequent Pattern Mining Algorithm Of Uncertain Data Set Based On Spark
8	Researches On Algorithms For Mining Top-K Frequent Patterns
9	Research On Mining And Querying Frequent Patterns Based On Simplified Frequent Pattern Tree
10	Research On The Application Of Apriroi And Time-series Patterns Discovery Algorithm On Cloting Store