| With the development of the bioinformatics, how to analyzes complex genomics data using machine learning approach has become an important research field. Gene expression data provided by microarray technologies can express gene expression modes under any given conditions. And they also help us make deep research into biological processes in essential, such as gene function, tumor, senility and drug. In this thesis we mainly discussed tumor classification and gene function classification methods using gene expression data, and we have also proposed some improvements of these algorithms and methods.This thesis improves tumor classification using gene expression data method in two aspects: classification algorithm and feature selection. We combined SVM with kNN, based on taking SVM as a 1 NN classifier in which only one representative point is selected for each class. In the case of testing samples, the algorithm computes the distance from the test sample to the optimal super plane of SVM in feature space, and choices the classification algorithm according to the distance. Experiments results show that the new algorithm can improve the classification accuracy than the old ones. The gene expression data set is always "few samples, high dimensionality". To solve this problem this thesis improves the classification accuracy using feature selection method. We have proposed a new recursive feature elimination method - Correlativity-based RFE. This new method searches for the minimum redundancy as well as avoids deleting the genes that most dominate the target phenotypes by calculating correlativity between genes. Experiments results show that higher classification accuracy is achieved by using the new feature selection approach, and the feature selection process costs less time.In the case of gene function classification using gene expression data, this thesis presents tow algorithms: confidence adjustment algorithm based on gene function tree (tCAA) and dominate factor decision algorithm based on gene function tree (tDA), according to the subjection relationship of function classes. According to these two algorithms, this thesis proposed a new gene function classification algorithm based on gene function tree. In the test phase, the algorithm automatically detects gene function confidence which is too high or is ignored, and then it adjusts the confidence according to tCAA. The new algorithm introduces tDA to avoid the limitation of fixed-size prediction. It employs dominate factor to decide the... |