Font Size: a A A

A Study On Feature Selection Algorithms Based On Support Vector Machine And Its Application

Posted on:2007-09-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y MaoFull Text:PDF
GTID:1118360182990569Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Feature selection and extraction techniques are hot topic in current information science, especially in field of pattern recognition. This kind of techniques progress with the step of development in artificial intelligence and computer technology synchronously. Various theoretic achievements based on statistical or machine learning appeared endlessly, a few of which had been applied in practical engineering and performed well.In this dissertation, theoretic research and related applications of feature selection based on support vector machine are discussed mostly. Considering the universality of application fields for feature selection algorithms, we select many kinds of datasets in chemical engineering and bio-informatics as the analysis and test objects for our algorithms. The relation of the features in these objects contains most kinds of relations among features in true-life, e.g. uncorrelated, and linear correlated and non-linear correlated. Using application fields as index and feature selection algorithm based on support vector machine as basic tool, the disposal for dealing with these relations is explained, and the physical meanings of these important features are also involved primarily. In order to prove the effectiveness of our algorithms comprehensively, fuzzy support vector machine and some other decision machines are selected to construct diagnosis systems based on the results achieved above.The main contributions of the dissertation are as follows:1. Each embranchment of feature selection and their evolutive status are introduced. Detailed analysis and explanation of achievements in the world are presented. Some difficulties and problems in the field including theoretic analysis and application are pointed out. And a constructive future research direction and development strategy in this field are also introduced.2. Datasets with high-dimensional but small number of samples appeared frequently in biomedical fields, e .g. biochips based cancer diagnosisdatasets. Traditional feature selection algorithms based on statistical measures or linear classifiers perform badly in many cases. Here, a feature selection method based on non-linear kernel support vector machine and genetic algorithm is introduced. By using genetic algorithm to adjust kernel width parameter and penalty parameter in non-linear kernel support vector machine, the proposed algorithm performs better than the aforementioned algorithms. And the features selected are considered with significant biomedical meanings.3. Although genetic algorithm based non-linear support vector machine feature selection algorithm achieved satisfying results in some degree, its computational complexity is too high to be applied in practical engineering. By analyzing the whole procedure of recursive feature elimination, recursive feature elimination based on non-linear kernel support vector machine with parameters adjusted adaptively is proposed. A strategy to adjust kernel width parameter fast is introduced. Test results on datasets from biomedical diagnosis indicated the performance of this algorithm is better than that of genetic algorithm based non-linear support vector machine recursive feature elimination in some degree, and the whole procedure is accelerated prodigiously for practical applications.4. Recursive feature elimination is a cycling procedure to eliminate features one by one, which is a very time-consuming process. If many features are eliminated at each time experientially, many important features may be eliminated prematurely and lost, which influences the usability of the algorithm deeply. Aiming at this problem, we proposed a series of statistical indices based on the contributions of features to final decision machine to accelerate feature ranking procedure while the performance of the algorithm keeps unchanged. Experimental results on TEP datasets indicated the algorithms achieved this goal.5. The objectives of feature selection operation could be separated into two parts basically. One is to select key variables as many as possible to find outthe roots of diagnosis or disease in an eye of research. The other one is to serve final decision procedure, eliminate uncorrelated or unimportant noise features. In this section, these two objectives are discussed based on multi-class biomedical cancer diagnosis datasets and TEP datasets. SVM-RFE is used in every two classes. When optimal feature groups are selected, fuzzy support vector machine with parameters adjusted by adaptive strategy is adopted to do decision. Experimental results indicated the work provided many informative key variables and based on these variables, a satisfying decision result is achieved.6. By a single classifier, it is very difficult to achieve a satisfying classification results. Aiming at this problem, a double-layered ensemble classifier based on feature selection algorithm is proposed. This algorithm is tested on ovarian datasets achieved by proteomic biochips. Experimental results indicated the proposed classifier achieved more accurate results than single classifier or ensemble constructed by bagging and the classifier structure is much simpler than the ensemble constructed by bagging.7. Finally, a conclusion and future research directions are presented.
Keywords/Search Tags:Support Vector Machine, Parameter Adjustment, Genetic Algorithm, Recursive Feature Elimination, Feature Selection, Fuzzy Decision, Accelerated Algorithm, Ensemble Classifier, Fault information detection and fault diagnosis in chemical engineering
PDF Full Text Request
Related items