Font Size: a A A

Discovery of indirect association and its applications

Posted on:2003-01-09Degree:Ph.DType:Thesis
University:University of MinnesotaCandidate:Tan, Pang-NingFull Text:PDF
GTID:2468390011980518Subject:Computer Science
Abstract/Summary:
Data mining has become an essential data analysis tool as it provides an automated procedure for the rapid discovery of novel but implicit knowledge in large databases. One of the main techniques in data mining is association pattern discovery, which attempts to find items that occur together relatively frequently in the data. This technique has been successfully applied to various application domains including business decision support, telecommunication alarm diagnosis, and molecular genomics.; As the current association pattern discovery algorithms are focused towards finding frequent patterns, they fail to capture other forms of interesting multivariate relationships such as negative associations, which are equally valuable in many application domains. For instance, negative associations characterize the dependence relationships between competing products such as Huggies and Pampers, or the opposite outcomes of related events in an event sequence database such as FIRE_ALARM=ON but FIRE_SPRINKLER=OFF. Mining negative associations is a computationally expensive problem, especially for sparse transaction data, where a large percentage of the extracted patterns have low interest values.; This thesis introduces a new type of pattern called indirect association, which provides an effective way to discover interesting negative associations by extracting only “infrequent patterns that are expected to be frequent.” An efficient, level-wise algorithm for mining indirect associations is presented to address the computational issue. The second part of this thesis extends the concept of indirect association to sequential data. Sequential indirect association has been successfully applied to Web usage data to discover groups of Web users who share a similar browsing behavior.; Finally, every association pattern discovery task requires a metric to evaluate the interestingness of the discovered patterns. While many such metrics have been proposed in the data mining literature, the metric that is most consistent with the expectations of domain experts is rarely known. This dissertation provides an in-depth study of how to select the most appropriate metric for a given application. The results of this study will have an impact on association pattern discovery and all other data mining tasks that require the use of an objective measure for preprocessing, post-processing or within the mining algorithm itself.
Keywords/Search Tags:Discovery, Data, Mining, Indirect association, Application
Related items