Font Size: a A A

The Research Of Conditional Discriminative Pattern Mining Algorithms

Posted on:2017-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:X Q LiuFull Text:PDF
GTID:2348330488459954Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Discriminative pattern mining is one of the most important techniques in data mining. This challenging task is concerned with finding a set of patterns that occur with disproportionate frequency in data sets with various class labels. Such patterns are of great value for group difference detection and classifier construction. Due to the lack of clear unified specification, there are different terminologies and methods about discriminative pattern mining in different communities and different periods, such as contrast set mining, emerging pattern mining, subgroup discovery and so on.In recent years, research on finding interesting discriminative patterns in class-labeled data evolves rapidly and lots of algorithms have been proposed to specifically address this problem. Although these existing algorithms can discover a certain number of significant discriminative patterns to some extent, the redundancy issue that the discriminative power of many patterns mainly induced by their sub-patterns is still not solved.This thesis proposes a novel notion dubbed conditional discriminative pattern, which is considered frequent, globally significant as well as locally significant in the discriminative pattern discovery. Similar to discriminative pattern mining, conditional discriminative pattern mining can be formulated as two problems as well:(1) Threshold-specified or Threshold-based conditional discriminative pattern mining and (2) Top-k conditional discriminative pattern mining. To address these two issues, this thesis proposes two effective algorithms called CDPM-TB (Conditional Discriminative Pattern Mining-Threshold-based) and CDPM-TK (Conditional Discriminative Pattern Mining-Top-k) to generate a set of non-redundant discriminative patterns. Both methods first condense the given data into a tree structure, and then follow the tree traversal to produce significant conditional discriminative patterns in a depth-first manner. Experimental results on real data sets from UCI as well as a breast cancer gene expression data set demonstrate that CDPM-TB and CDPM-TK have relatively good performance on removing redundant patterns whose discriminative power is derived from significant sub-patterns. Consequently, CDPM-TB and CDPM-TK are of considerable value in practical applications.
Keywords/Search Tags:Discriminative pattern, Contrast set, Emerging pattern, Subgroup discovery, Data mining
PDF Full Text Request
Related items