Typically, feature gene selection aims to find a compact feature subset used toconstruct a pattern classifier with reduced complexity, in order to improve theclassification performance. It is not only for us to find disease-related genes andimprove classification of tumors, but also reduces the cost of the clinical diagnosisof tumor type.An effective feature gene selection method should not only be able toproduce a solution with better classification performance, but also it should havegood robustness. Gene expression microarray data with characteristics ofsignificantly less sample and high dimension, some related studies confirm this kindof dataset more easily lead to poor robustness of feature selection methods. However,the existing feature selection methods are mostly concerned about the classificationperformance of the algorithm, easy to overlook the robustness of the algorithm.The main research work is as follows:A feature gene selection method based on prior information fusion. When thenumber of extracted features is small, the classification performance is high, butwhen the number of features exceeds a certain threshold, the classificationperformance gets lower. Based on this assumption, we first remove noise genes aswell as unrelated genes, and then use a heuristic breadth-first search algorithm forfeature gene selection. At the same time, we propose using multiple testingprocedures (MTP) to fuse the priori information, in order to make full use of theclinical and reliable information, so that it further improve the accuracy of tumorsubtype classification. Experimental results show that our method can select a morecompact feature gene subset, and it has a better classification performance.A feature gene selection method based on multicriterion fusion. Geneexpression data with the characteristics of high-dimensional and small sample size,likely to cause the poor robustness of feature gene selection algorithm. If a featuregene selection algorithm lacks robustness, it might produce unrepeatable resultseven only a few samples are added to or deleted from the training dataset. Evenwithout perturbation of training data different feature selection algorithms usuallyproduce different selection results. The inconsistent gene selection results thusproduced could cause confusions to biological researchers and result in loss ofconfidence in clinical diagnosis. In this article, we use the property that the various basic filter criteria portray different focus of the sample data distribution. At thesame time, in order to avoid that multi-feature fusion is difficult to portray thecomplexity of the sample data distribution resulting in lower classification accuracy,we propose the integration of multiple criteria and priori information scoring, andthen use a forward-backward algorithm to eliminate half of features at each iterationfor feature gene selection. Experiments show that this method is effective topreserve the genes wrongly eliminated by the bias of a single criterion, the methodhas similar classification performance with other methods, and it has betterrobustness. |