Font Size: a A A

Feature selection and statistical alternatives for machine learning applied to in-silico drug design

Posted on:2003-08-13Degree:Ph.DType:Dissertation
University:Rensselaer Polytechnic InstituteCandidate:Arciniegas, Fabio AndresFull Text:PDF
GTID:1468390011480674Subject:Operations Research
Abstract/Summary:
Feature selection has recently been the subject of intensive research in data mining, especially for datasets with a large number of descriptive attributes such as QSAR (Quantitative Activity Structure Relationship) data. QSAR is an in-silico drug design methodology, which requires identifying important features of molecules that explain a relevant drug property. A typical QSAR dataset for predicting an activity of interest is characterized by a large number of descriptive features (300–1000) for a relatively small number of compounds (molecules).; Finding the best feature subset for a given problem with N number of features requires evaluating all 2N possible subsets. The best feature subset also depends on the predictive modeling, which will be employed to predict the future unknown values of response variables of interest. Feature selection involves minimizing the number of relevant features for maximizing the predictive power of the model. From this point of view feature selection can be viewed as a special type of multi-objective optimization problem.; This dissertation proposes machine learning algorithms as predictive modeling tools for QSAR problems, and develops a novel approach for feature selection based on feature saliency. In addition, this approach is computationally less expensive than other machine learning feature selection methods (i.e., weight pruning for ANNs), and it works for any nonparametric regression algorithm.
Keywords/Search Tags:Feature selection, Machine learning, Drug, QSAR
Related items