Font Size: a A A

Research On Parallel Sequential Patterns Mining Algorithm Based On Prefix Tree

Posted on:2012-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y DongFull Text:PDF
GTID:2178330338491274Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sequential pattern mining has become an important data mining task with wide application, such as DNA sequential patterns, mining the web pages and the prediction of natural disaster. The so-called sequential pattern mining is defined as a knowledge discovery process to find frequent sub-sequences as a model from sequence databases. Currently, researchers pay more attention on sequential pattern mining with parallel technology. But there are several key problems in parallel sequential pattern mining that need to solve, such as lack of effective mining algorithm and pruning strategy, the communication between processors and load balancing.Firstly, this article analyses the advantages and disadvantages of some typical parallel sequential pattern mining algorithms. Aimed at lacking efficient mining algorithm and pruning strategy in parallel sequential pattern mining, this paper proposes a non-closed parallel sequential pattern mining algorithm based on prefix tree. It uses a prefix tree structure to mining local sequences and global sequence based on prefixspan algorithm. In order to improve the mining efficiency, a new prefix tree pruning technique is presented to delete the global k-sequence which is unable to expand.There is a problem that it will generate a large number of candidate sequences during the sequential pattern mining. In order to reduce the candidate sequences during the mining process. First, it only mines closed sequential patterns instead of frequent patterns. Second, aimed at the closure checking in closed sequential pattern mining, this paper do an improvement on BIDE algorithm, a backscan pruning technology is applied to delete the sequences in advance which are not hope of generating closed sequential pattern. Finally, this algorithm proposes a new dynamic load balancing mechanism called "nearest application" to address on the load imbalance in parallel mining process.This article uses the C++ language and the MPICH programming environment to compile these algorithms on different data sets. Through the experiment results, this paper analyses the performance of the algorithm on every aspects and the existing problems.
Keywords/Search Tags:sequential mining, parallel sequential mining, global sequence, prefix tree, load balancing
PDF Full Text Request
Related items