Study And Implementation On Techniques Of Parallel Mining Of Frequent Closed Sequences Based On Vertical Segmentation

Posted on:2016-08-04

Degree:Master

Type:Thesis

Country:China

Candidate:T C Bi

Full Text:PDF

GTID:2428330542957391

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Sequential pattern mining is an important part of data mining.After Agrawal and Srikant raising the concept of sequence,more and more researchers are taking part in this subject.When it comes to application,sequence mining has a widely used.It can be used in Market Analysis,Fraud Detection,Scientific Exploration,Product Control and so on.With the development of data mining,it will play a huge role in more fields.With the development of web 2.0,the information explosion has become worse,and it brings a huge challenge to sequential pattern mining.When facing big data,which means we can not put the whole data into a single computer,how can we mining sequential pattern.Many parallel algorithms need to generate candidate sequential pattern,the others has not to do this.But both of them rely on physical memory,once the original data can not fit the memory,we could not run the algorithm any more.The contribution of this thesis as follows:(1)According to our current knowledge,it is the first time that we propose the concept of vertical segmentation of Sequence Mining.The time complexity of this algorithm is related to the number of colums.We first intersect each of the two sequences,it helps to decrease the length of the sequence.After that the original sequence is consisted of many shorter pattern,we select K sequences which are different to each other.(2)In order to mining in a small dataset,most of the algorithm of sequential pattern mining compress original data when the data are huge.In this thesis,we present the concept of pattern compression,compressing pattern has lots of benefits such as reducing the scale of enumeration,shortening the time of mining,reducing time complexity.(3)Considering data can not fix in memory when it comes to big data,we improve the algorithm which rely on physical memory.In each job of MapReduce,we only mining a fixed length of sequential pattern,although it is not as efficient as the older algorithm,it helps to solve the problem that the dataset can not fix the memory.(4)Our algorithm is based on Hadoop which is a parallel framework.First of all,we distribute data to different nodes in cluster.According to the feature of map and reduce,we rewrite the algorithm running on PC.Since the candidates are independent with each other,our algorithm achieves high speed-ups.

Keywords/Search Tags:

Data Mining, Pattern Mining, Pattern Compression, Parallel Mining, MapReduce

PDF Full Text Request

Related items

1	MapReduce-based Parallel Data Mining Services For TCM
2	Research On Parallel Mining Algorithm Of Space Co - Location Based On Hadoop
3	Research And Parallel Processing Of Top-k High Utility Pattern Mining Algorithm Based On Projection Table Structure
4	Research And Application Of Mining Access Sequential Pattern In Weblog
5	Design Of Frequent Pattern Mining Algorithm LPS-Miner And Research On Parallel Formulations
6	Multi-threshold Based Contrast Pattern Mining And Its Application In Classification Of Imbalanced Datasets
7	Research On Parallelization And Load Balancing Of Frequent Pattern Mining Algorithm Based On MapReduce
8	Research On Sequential Pattern Mining
9	A Multi-flow Streaming Data Fre Quent Pattern Mining Adaptive Algorithm
10	Research On Algorithm For Mining Gathering Pattern Of Spatio-Temporal Trajectory In Cloud Computing Environment