Font Size: a A A

Mining frequent patterns from sequences: Theory, algorithm, implementation, and performance

Posted on:2008-10-20Degree:M.SType:Thesis
University:University of Arkansas at Little RockCandidate:Turkia, Markus PetteriFull Text:PDF
GTID:2448390005955448Subject:Computer Science
Abstract/Summary:
Mining frequent patterns from sequences is an important data mining problem which has direct applications in many areas. In this thesis, we make three contributions to the state-of-the-art of the sequential frequent pattern mining.; First of all, we propose a fast pattern-growth mining algorithm using a novel sequence database representation called First-Occurrence Linked WAP-tree (FLWAP-tree). The pattern-growth mining algorithm using the Pre-Order Linked WAP-tree (PLWAP-tree) was reported in the literature to be faster than other algorithms. We show that our pattern-growth using our FLWAP-tree outperforms the PLWAP-tree mining significantly and consistently.; Secondly, we extend the pattern-growth algorithm with partial enumeration so that the frequent patterns can grow with more than one symbol at a time. Our extended pattern-growth algorithm can be regarded as the one that blends both pattern-growth and apriori enumeration mining algorithms in one framework. We show that partial enumeration can speedup the pattern-growth mining when the depth of partial enumeration is properly controlled. Partial enumeration can also reduce the load imbalance among the parallel tasks when the pattern-growth mining algorithm is parallelized to run on parallel computers.; Lastly, we parallelize our pattern-growth mining algorithm using partial enumeration, and show that partial enumeration is essential to lift up the maximum speedup achievable on parallel computers.; In this thesis, we present the theory, algorithm design, implementation, and the experimental results for each of the contributions we make.
Keywords/Search Tags:Mining, Algorithm, Frequent patterns, Partial enumeration
Related items