Font Size: a A A

Contrast and compact data mining: Discovering novel and useful patterns from large databases

Posted on:2010-08-03Degree:Ph.DType:Dissertation
University:York University (Canada)Candidate:Wan, QianFull Text:PDF
GTID:1448390002986285Subject:Artificial Intelligence
Abstract/Summary:PDF Full Text Request
Data mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. This valuable information can be patterns, associations, changes, anomalies, significant structures and so on.;The second main contribution of this study is in proposing an innovative approach to generating compact transaction databases for efficient frequent pattern and indirect association mining. We use a compact tree structure, called CT-tree, to generate compact transaction databases and super compact transaction databases. The effectiveness and efficiency of our approach are verified by the experimental results on both synthetic and real-world data sets. It can not only reduce the number of transactions in the original databases and save storage space, but also greatly reduce the I/O time required by database scans and improve the efficiency of the mining process.;This dissertation consists of two parts. In the first part, we address the problem of mining contrast patterns. We propose two new types of contrast patterns, called transitional patterns and diverging patterns, respectively. Transitional patterns include both positive and negative transitional patterns. Their frequencies increase/decrease dramatically at some time points of a transaction database. We introduce the concept of significant milestones for a transitional pattern, which are time points at which the frequency of the pattern changes most significantly. Diverging patterns represent the patterns whose frequency changes in different directions in two contrast data sets. We use a four-dimensional vector to represent each pattern, and define the pattern's diverging ratio based on the angular difference between its vectors in two datasets. Our experimental studies on both synthetic and real-world data sets illustrate that mining transitional patterns and diverging patterns is highly promising as a practical and useful approach for discovering novel and interesting knowledge from large databases.
Keywords/Search Tags:Patterns, Data, Mining, Useful, Compact, Contrast
PDF Full Text Request
Related items