Font Size: a A A

Research On Mining Sequential Patterns With Periodic Wildcard Gaps

Posted on:2015-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:L L WangFull Text:PDF
GTID:2298330452494426Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Sequential pattern mining, an important branch of data mining research, is used forfinding kinds of rules and extracting valuable knowledge and information hidden in a largeamount of data from various application fields. Mining frequent patterns with periodicwildcard gaps is a kind of sequential pattern mining with wildcard gap constraints. Itrequires that wildcard gap exists between item and item of a pattern and the sizes or scopesof gaps which meet user-specified number are the same. The form of frequent patterns withperiodic wildcard gaps can be described as p0[M,N]p1...[M,N]pj...[M,N]pm-1,where p0,p1,...pj,...pm-1are items, M and N represent the minimum and maximum gap sizes,respectively. Compared to general patterns, these patterns are more flexible and effective,and so this mining research is becoming an significant research direction for modernsequential pattern mining. This dissertation focus on studying effective algorithms formining frequent patterns with periodic wildcard gaps in order to further improve its miningefficiency.The main research content and novel work of this dissertation are as follows:(1)We summarize the characteristics of sequential patterns with periodic wildcard gapsand its ordinary methods of mining, then introduce and analyze the existing miningalgorithm MPP, MGCS based on GCS and Apriori in detail. Aiming at the shortcomings ofexisting algorithms, we puts forward a new way to calculate the number ofpatterns’supports by using the incomplete Nettree structure and design the correspondingalgorithm INSupport.(2) Based on the algorithm INSupport, combining with the stack and queue datastructures, two efficient mining algorithms MAPB (Mining sequentiAl Pattern usingincomplete Nettree with the Breadth first search) and MAPD (Mining sequentiAl Patternusing incomplete Nettree with the Depth first search)are proposed to improve the existingalgorithms. Experimental results show that the running time performance is greatlyimproved by the MAPB and MAPD algorithms, and MAPD has the best performance. ForMAPD, both the running time and space consumption are the smallest and it can minelonger sequences quickly. (3)Top-K sequential patterns with periodic wildcard gaps and the method to solve thisproblem are introduced in detail. After this, we propose a heuristic algorithm MAPBOK(MAPB for tOp-K) based on MAPB. Although this algorithm is not able to get the top Kmost frequent patterns for each length accurately, experiments show that it can obtain theresult set in higher accuracy in very short time especially when a sequence is longer.
Keywords/Search Tags:Sequential pattern mining, Periodic wildcard gap, Pattern matching, Heuristic algorithm, Nettree
PDF Full Text Request
Related items