Mining frequent patterns from sequences: Theory, algorithm, implementation, and performance | Posted on:2008-10-20 | Degree:M.S | Type:Thesis | University:University of Arkansas at Little Rock | Candidate:Turkia, Markus Petteri | Full Text:PDF | GTID:2448390005955448 | Subject:Computer Science | Abstract/Summary: | | Mining frequent patterns from sequences is an important data mining problem which has direct applications in many areas. In this thesis, we make three contributions to the state-of-the-art of the sequential frequent pattern mining.; First of all, we propose a fast pattern-growth mining algorithm using a novel sequence database representation called First-Occurrence Linked WAP-tree (FLWAP-tree). The pattern-growth mining algorithm using the Pre-Order Linked WAP-tree (PLWAP-tree) was reported in the literature to be faster than other algorithms. We show that our pattern-growth using our FLWAP-tree outperforms the PLWAP-tree mining significantly and consistently.; Secondly, we extend the pattern-growth algorithm with partial enumeration so that the frequent patterns can grow with more than one symbol at a time. Our extended pattern-growth algorithm can be regarded as the one that blends both pattern-growth and apriori enumeration mining algorithms in one framework. We show that partial enumeration can speedup the pattern-growth mining when the depth of partial enumeration is properly controlled. Partial enumeration can also reduce the load imbalance among the parallel tasks when the pattern-growth mining algorithm is parallelized to run on parallel computers.; Lastly, we parallelize our pattern-growth mining algorithm using partial enumeration, and show that partial enumeration is essential to lift up the maximum speedup achievable on parallel computers.; In this thesis, we present the theory, algorithm design, implementation, and the experimental results for each of the contributions we make. | Keywords/Search Tags: | Mining, Algorithm, Frequent patterns, Partial enumeration | | Related items |
| |
|