A Study On Parallel Mining Of Continuous Sequential Pattern

Posted on:2016-07-26

Degree:Master

Type:Thesis

Country:China

Candidate:M J Peng

Full Text:PDF

GTID:2428330482481286

Subject:Systems analysis and integration

Abstract/Summary:

PDF Full Text Request

With the continuous development of the informatization level of the society,the information plays an increasingly important role in modern social life.Continuous sequential pattern algorithm can find continuous frequent sequential patterns from the sequences of target.The traditional sequential pattern algorithms are available in the field of retail business,network communications,finance,weather analysis.However,they allow the frequent items that found in sequences are saltatory and non-continuous.In this regard,Continuous-PrefixSpan serial algorithm alters the definitions of sequence,prefix,suffix and projection in original algorithm.Only when the first element of the sequence to be projected equivalent to the last element of prefix,it will be selected by projection database,this is the way to ensure the result is continuous.The development of the informatization level also brings massive data.Relational databases such as Oracle,SQL server are useless when face the massive data sets which are TB or even PB size.Meanwhile,the sequential mining algorithms need to scan the original database several times,which is exactly the weakness of traditional relational databases.Based on the above,this paper presents a parallel data mining and storage solutions which is based on Hadoop platform.As an open source software parallel platform for program development,Hadoop has the Map/Reduce parallel programming model that allows multiple computers involved in the calculation at the same time,which is greatly reducing the time of processing.Its parallel file system HDFS uses each memory space of datanodes in the cluster to save data and replicate it to other datanodes.That is the way HDFS solve the problem about the lack of memory space when face the mass data,this also improve safety of the data.This paper focuses on improving PrefixSpan algorithm on Hadoop platform and has designed a appropriate Map/Reduce algorithm for parallel continuous sequential pattern mining with two times of Map/Reduce.The algorithm also ensures the process of breadth-first search algorithm can work parallelly.On this basis,the paper also introduces Hive-a component of Hadoop platform to preprocess the data parallelly,which is in order to make the entire mining process parallelly.It is significant to successfully transplant the traditional serial sequential pattern mining algorithm on Hadoop platform.The Hadoop platform can take fully advantage of the computing power and storage of each datanodes,which is efficient,lowcost and has high application value.

Keywords/Search Tags:

Data Mining, Continuous Sequential Pattern, PrefixSpan, Hadoop, HDFS

PDF Full Text Request

Related items

1	Research On Web Pattern Mining Method Based On PrefixSpan Algrithm
2	Research And Application Of Projection Position-Based Sequential Pattern Mining Algorithm
3	Research On Algorithm Of Large Data Set Sequential Pattern Mining
4	Research On User Access Sequential Pattern Mining Based On Web Log
5	Research And Implementation Of Intrusion Detection System Based On Sequential Pattern Mining
6	Research On Improvement Of Algorithm Prefix Span For Sequential Pattern Mining
7	The Research And Application Of Intrusion Detection Based On Sequential Pattern Mining
8	Study Of Applying Sequential Pattern Mining To Highway Tunnel Traffic
9	Malware Detection Based On Sequential Pattern Mining Algorithms
10	Constraint-based Sequential Pattern Mining And Its Applications