Font Size: a A A

Mining Sequential Patterns With Periodic General Gap Constraints

Posted on:2016-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhouFull Text:PDF
GTID:2308330479498970Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Sequential pattern mining method can effectively discover the frequent patterns in the sequences and plays an essential role in many critical data mining tasks. Given sub-patterns pi and pj(i<j) can match events A and B respectively, traditional pattern mining methods can detect the sequence in which event B is after event A, but fail to find the sequence with event B occurring before event A. Therefore, the frequent patterns are always ordered since the sequences are ordered. To tackle this challenge, in this paper, we propose sequential pattern mining with periodic general gap constraintsThe main tasks of this paper are as follows:1. We propose the problem of sequential pattern mining with periodic general gap constraints in this paper and also give the form definitions about the problem. Since the Apriori property does not hold under the problem of this paper, we provide a novel definition for the offset sequences of a pattern in order to make the problem in this paper satisfiy Apriori property.2. A new method is adopted in this paper to measure the frequency of the patterns. The traditional definition of a pattern’s support in SDB is the number of sequences containing this pattern, which is not that reasonable because it doesn’t reflect the frequency of the pattern within the sequences in the SDB. Meanwhile, the total number of occurrences of the pattern in each of the sequences of the SDB is very huge, so it is not supposed to be considered as the measurement. Consequently, the measurement of support ratio is adopted.3. Effective mining algorithm is designed in this paper. Since an incomplete nettree can represents the occurrence positions and support of a pattern within a sequence, the positions and support of the pattern within the SDB can be represented by incomplete nettrees forest. The mining algorithm is depth first, that is to say, establishing the incomplete nettrees forests for all the super-patterns of the pattern after scanning the SDB once, then accounting the support ratio of the super patterns and putting those frequent ones and their corresponding incomplete nettrees forests into the stack, then the top element of the stack will be popped out and continue the above process until the stack being empty, which enhances the mining efficiency significantly.4. Representitive datasets and contrast algorithms are chosen in this paper. Besides, a mount of experiments are conducted. The performance of the algorithm in this paper is validated by both the mining results and mining efficiency.
Keywords/Search Tags:sequential pattern mining, general gap, frequent pattern tree, pattern matching, Apriori property
PDF Full Text Request
Related items