Font Size: a A A

Large-scale constraint-based pattern mining

Posted on:2010-02-12Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Zhu, FeidaFull Text:PDF
GTID:1448390002477648Subject:Computer Science
Abstract/Summary:
We studied the problem of constraint-based pattern mining for three different data formats, item-set, sequence and graph, and focused on mining patterns of large sizes. Colossal patterns in each data formats are studied to discover pruning properties that are useful for direct mining of these patterns. For item-set data, we observed robustness of colossal patterns. By defining the concept of core patterns, we developed a randomized mining framework to efficiently find the set of colossal patterns which gives a good approximation to the complete pattern set. The essential idea of pattern fusion and leaping toward large patterns is then extended to the cases of sequential and graph data. In sequential data, we developed a novel algorithm to accommodate approximate patterns. For graph data, we proposed the concept of spiders and used these pre-computed frequent structures of small sizes to quickly leap to reach those much larger ones. We also proposed a general graph mining framework, called gPrune, to take advantage of both pattern and data space pruning. Ideas and techniques developed in this work can be extended to handle other user-specified constraints for direct efficient mining in large-scale data.
Keywords/Search Tags:Mining, Data, Pattern, Graph
Related items