Font Size: a A A

Iterative feature weighting for identification of relevant features in machine learning: With multilayer perceptron, radial basis function and support vector architectures

Posted on:2006-04-03Degree:Ph.DType:Dissertation
University:Case Western Reserve UniversityCandidate:Duan, BaofuFull Text:PDF
GTID:1458390008951167Subject:Engineering
Abstract/Summary:
In multivariate data analysis, samples may be described in terms of many features, but in specific tasks some features may be redundant or irrelevant, serving primarily as sources of noise and confusion. The irrelevant and redundant features not only increase the cost of data collection, but may also be the reason why machine learning is often hampered by lack of an adequate number of samples.; Feature selection can be used to address the issue by identifying and selecting only those features that are relevant to the specific task in question. An alternate approach is feature weighting which assigns continuous-valued weights to each and all the features used in the description of data samples. Feature weighting can help reduce the effect of irrelevant features by assigning smaller weights to them and larger weights to relevant features.; In this dissertation, we study the effect of irrelevant features on neural network design and propose a framework for iterative feature weighting with neural networks. The framework iteratively improves the trained neural networks until reaching the optimal network model. On the other hand, feature weights are evaluated through trained neural networks and hence they converge to the optimal solutions as well. We present a convergence theorem to guide the design of the framework and then implement the framework for three typical neural network architectures. The implementations of these iterative feature weighting methods are applied to locally synthesized data and to benchmark datasets, and good results have been obtained. Results for the MONK's problems show that these methods are very effective in identifying relevant features that have complex logical relationships in data. Results for the Boston housing data show that the performances of regression models can be improved through iterative feature weighting. Results for the Leukemia gene expression data show that these methods can be used not only to improve the accuracy of pattern classification, but also to identify features that may have subtle nonlinear correlation to the task in question.
Keywords/Search Tags:Features, Data
Related items