Font Size: a A A

Constraint-based Sequential Pattern Mining And Its Applications

Posted on:2016-10-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:J S ZhangFull Text:PDF
GTID:1108330503993722Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sequential pattern mining, which discovers frequent subsequences as patterns in sequence databases, is an important data mining problem with broad applications, such as feature selection for sequence classification and prediction, discovering access patterns in Web logs, biological sequence analysis, and natural language analysis. Sequential pattern mining approaches have been studied extensively, such as general sequential pattern mining, compact sequential pattern mining,and interesting sequential pattern mining. Since the compact sequential pattern mining, including closed sequential pattern mining and sequential generator mining, produces a compact yet lossless representation of general sequential patterns, has become an active topic in data mining community.However, existing compact sequential pattern mining algorithms, when using low support thresholds or pattern-enriched databases, also pose a great challenge at spawning a large number of inefficient and redundant patterns. The generated pattern set is too huge to be used effectively,which is an open issue in sequential pattern mining community. Moreover, to generate these patterns, the cost of mining process is prohibitively expensive. This thesis aims at the ideal sequential patterns that are as compact as possible and meanwhile carry the same information w.r.t. the closed sequential patterns and the sequential generators.To address the above problems, we investigated the combination of the contiguous and the closed constraints for the more compact but lossless sequential pattern mining, i.e., the closed contiguous sequential pattern mining and the contiguous sequential generator mining. We also proposed a similarity-based FINDS algorithm, namely FIND-SS to perform the definitional sequential pattern mining. Moreover, the definitional sequential patterns were used for concept extraction in ontology learning model. The details of our work are proposed as follows:1. We presented an algorithm, CCSpan, for closed contiguous sequential pattern mining. CCSpan adopts a snippet-growth paradigm to generate potential patterns and launches three pruning fashions to prune the futile parts of search space. A complete set of closed contiguous sequential patterns is generated by performing the upper-closure checking.2. We investigated the equivalence class theorem and proposed an algorithm, namely Con Sgen, to mine contiguous sequential generators. Based on the above snippet-growth and the pruning techniques, we obtain the sequential patterns with contiguous property. The generated pattern set is partitioned into a series of equivalence classes and the contiguous sequential generators are identified by the lower-closure checking.3. We proposed a similarity-based FIND-S algorithm called FIND-SS. Find-SS adopts a “the more similar the higher priority”scheme to generalize every two sequences in database.Meanwhile, the upper bound hypotheses are pushed deep easily and elegantly into the search process and contribute to the generation of a series of target concepts. FIND-SS can accommodate some noisy information and does not require any pattern seeds for definitional sequential pattern mining.4. We devised a definitional pattern-based concept extraction scheme for ontology learning.Some definitional patterns are first used for definition extraction from given sequence datasets. The concepts of ontologies as the definiendums are identified from the definitional sentences by combining the definitional pattern and a few lexical characteristics. Moreover, a service-oriented ontology learning framework are designed to accommodate the cloud computing.5. A thorough performance study with both sparse and dense, real, and synthetic datasets has demonstrated that our algorithms outperform the state-of-the-art ones in terms of effectiveness, efficiency and scalability.
Keywords/Search Tags:Sequential Pattern Mining, Closed Sequential Pattern, Sequential Generator, Contiguous Sequential Pattern, Definitional Sequential Pattern, Biological Sequence Analysis
PDF Full Text Request
Related items