Font Size: a A A

Spectral feature selection for mining ultrahigh dimensional data

Posted on:2011-05-01Degree:Ph.DType:Dissertation
University:Arizona State UniversityCandidate:Zhao, ZhengFull Text:PDF
GTID:1448390002468951Subject:Computer Science
Abstract/Summary:
The rapid advance of computer-based high-throughput technology and the ubiquitous use of the web have provided unparalleled opportunities for humans to expand their capabilities in production, services, communications, and research. In this process, immense quantities of high-dimensional data are accumulated, challenging the state-of-the-art machine learning techniques to efficiently produce useful results. Feature selection can effectively reduce data dimensionality by removing irrelevant and redundant features. It brings the immediate effects of speeding up data mining algorithms, and improving mining performance such as predictive accuracy and result comprehensibility. In the last decade, a large amount of relevance criteria have been developed to evaluate the utility of features in feature selection, and these criteria are largely studied separately according to the type of learning: supervised or unsupervised. This dissertation studies spectral feature selection, which is a novel general feature selection framework based on graph spectral analysis. It unifies both supervised and unsupervised feature selection, and can generate families of algorithms for both learning contexts. It also includes many existing algorithms as its special cases and allows their joint study to gain insights. A common issue of many existing algorithms is that they are univariate method, and thus cannot handle redundant features. The proposed spectral feature selection framework can be readily extended to conduct multivariate analysis for addressing the limitation effectively. One of the most challenging problems in feature selection research is the small sample problem, in which the lack of information further worsens the situation of high dimensionality. Spectral feature selection also provides a natural way to include domain knowledge from multiple sources to enrich information and address this problem. The resulted multi-source feature selection technique represents one of the latest development trends in feature selection research. Extensive experimental study is conducted and results demonstrate that spectral feature selection achieves superior performance in various learning contexts.
Keywords/Search Tags:Feature selection, Mining, Data
Related items