Font Size: a A A

Research On Key Problems For Parallel Sequential Patterns Mining

Posted on:2010-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:H H JiangFull Text:PDF
GTID:2178360275977631Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As one of the the primary research field in data mining, sequential pattern mining which enumerates frequent spatio-temporal characteristic patterns in large database is a time-consuming task and it seems that it is a good idea to speedup the mining process by using of parallel compute technology. The research on parallel sequential pattern mining algorithm shows that some key problems such as workload balancing, communication and searching strategy in solution space among computing-units are the key problems in developing efficient parallel mining algorithm. These key problems are studied detaily in this dissertation and the main content of this dissertation is as follow:(1) The ideas of several known parallel sequential patterns mining algorithm are discussed in detail. The analysis indicates that the key problems that affect efficiency of parallel algorithm for mining sequential patterns are effective strategies for loading balancing, communication and searching in solution space, also the deficiency of the solution to these key problems used in these algorithms is presented.(2) Aiming to the parallel computer system with shared memory, dynamic tasks distribution method to the workload balancing problem, local parallel pruning technology for the searcing strategy in solution space and task parallelism model based sequential patterns mining algorithm are used. A parallel algorithm for mining sequential patterns using these methods based on SMP computer system, abbreviated by PFSPAN, is designed in this dissertation. Both theoretical analysis and practical experiments show that PFSPAN can mine sequential patterns effectively.(3) Aiming to parallel computer environment with distributed memory, a parallel algorithm model combining data parallelism and task parallelism, the communication method which bases on transmission of prefix trees, the loading balancing method include static and dynamic task distribution, and the two-step searching method in solution space which bases on global pruning and item-extending pruning according to the order of items are also used. On the basis of these methods, algorithm FPMSP, a parallel and distributed algorithm for mining sequential patterns on cluster has been designed in this dissertation, too. Through theoretical analysis and experiments, the performance of FPMSP is proved.
Keywords/Search Tags:sequential pattern, parallel algorithm, workload balancing, searching strategy in solution space, communication
PDF Full Text Request
Related items