Font Size: a A A

Research On Sequential Pattern Mining Algorithm Based On Constraints

Posted on:2007-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:J S ZongFull Text:PDF
GTID:2178360212995463Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The current algorithms for mining sequential patterns efficiently mine the complete set of sequential patterns in large database. However, along with the applications becoming more and more complex, more and more customers just focus on their interested sequential patterns instead of the frequent patterns. So a very important task is the sequential patterns mining considering the customers'requirements as constraints. The paper has mainly focused on how to mine sequential patterns with constraints, which are important data mining problems with broad applications, including the customer purchase patterns or the page accessing patterns of a web site, the telecom warning patterns and the DNA sequential patterns.First, an algorithm for mining sequential patterns with regular expression constraints is proposed. The algorithm adopts regular expression to denote constraints of users and then syncretize it into the process of incremental mining. The algorithm uses three strategies to optimize mining process, in order to decrease the mining time. The algorithm outperforms the FASTUP algorithm for mining sequential patterns in daynamic database.Second, an algorithm for incremental updating of sequential patterns is proposed when the algorithmic parameter (minimum support threshold) is changed by user. The algorithm, based on the structure of lattice frequent pattern tree (LFP-tree), stores the sequential pattens and their support discovered during prior mining processed in LFP-tree and the index set mapped table (ISMT) for further queries. The algorithm reduces the size of set of candidate sequences, decrease the mining time and improves the mining efficient. When the minimum support threshold gets gradually smaller, the algorithm outperformes the MEMISP algorithm significantly.Finally, an effective approach for mining periodic constraint sequential pattern is proposed. The algorithm adopts granularities-based periodic time constraint to describe real-life periodic time concepts. A simple and novel hyper-liked data structure, HP-CSB, is defined, and also uses two methods to generate candidate sequences, in order to improve the efficiency. The PCS-mine algorithm outperformes the PrefixSpan algorithm significantly.Experimental results show that the algorithms proposed in this paper are more efficient than the current ones, and the anticipated results are realized.
Keywords/Search Tags:Data mining, Sequential pattern mining, Constraints mining, Regu- lar expression constraint, Periodic constraint
PDF Full Text Request
Related items