Font Size: a A A

Feature cluster selection for high-dimensional data analysis

Posted on:2008-12-18Degree:M.SType:Thesis
University:State University of New York at BinghamtonCandidate:Li, HaoFull Text:PDF
GTID:2448390005451944Subject:Computer Science
Abstract/Summary:
This thesis address the gaps between traditional data mining tasks, feature selection and clustering, and the knowledge desired by domain experts in real-world applications. It illustrates two particular gaps using microarray data analysis: the gap between a near-optimal feature subset and a candidate set of interesting features, and the gap between good clusters and relevant clusters. This thesis proposes to bridge such gaps by a new data mining task, feature cluster selection, which aims to select and group all relevant features in a data set into a small number of coherent clusters. It provides both formal definition and empirical formulation for the new problem, and describes an efficient solution based on Max-relevance, Max-cohesion, and Min-separation criteria. Experiments on microarray data verify that the solution can discover relevant feature clusters of statistical significance as well as select representative feature subsets of high accuracy.
Keywords/Search Tags:Feature, Data, Selection, Clusters
Related items