Font Size: a A A

Application Of Multi-feature Based Classifier Ensemble For Gene Expression Data Classification

Posted on:2009-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y O ZhaoFull Text:PDF
GTID:2178360272477600Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the development of the Human Genome Program, the DNA microarray technology arises as a revolutionary technology at the time. It can detect tens of thousands of gene expression data automatically, rapidly and efficiently. Through analysis of the gene expression data, we can understand the physiological state of cells at the molecular level, such as survival, proliferation, differentiation, apoptosis, canceration, irritability and so on. These issues play an important role in medical diagnosis, drug efficacy judgment and disease explanation.Gene Expression data is very complex and the number is enormous. It is very difficult to be explained through medical imaging method directly. Thus, gene expression data classification has become one of the toughest questions in the field of bioinformatics. In the early time, the pattern recognition methods have often been employed and achieved some results with the help of the strong power of computers. In recent years, as machine learning algorithms are widely used in the field of bioinformatics, these methods are proposed for gene expression data classification as a new way. However, due to the few samples, the excessive features and nonlinear of the gene expression data, there are some difficulties to apply these methods directly. This is manly because: 1. important features are covered up by the excessive unrelated features and they are hard to be learnt by the classifiers. 2. Too few samples make the classifier over-fitted. In order to solve the first problem, feature selection methods have often been applied to reduce the dimensions. For the second problem, classifier ensembles have usually been used in order to increase the classification accuracy.For an excellent gene expression data classification system, the genetic feature selection and classification ensembles are the two essential steps. However, these two steps are often isolated in practical applications. The previous steps would not provide a good foundation for the next steps, and even reduce the overall classification accuracy.In this paper, a novel ensemble of classifiers based on multi features has been proposed. This method combines the genetic feature selection and classifier ensembles. The algorithm is expressed as follows: Firstly, in order to extract useful features and reduce dimensionality, different feature selection methods such as correlation analysis, Fisher-ratio is used to form different feature subsets. Then a pool of candidate base classifiers is generated to learn the subsets which are re-sampling from the different feature subsets with PSO (Particle Swarm Optimization) algorithm. At last, by the selective ensemble's idea of"many could be better than all", appropriate classifiers are selected to construct the classification committee using EDA (Estimation of Distribution Algorithms).Four common datasets namely Leukemia, Colon, Ovarian and Lung Cancer have been applied in order to test this method. Experiments show that our proposed method gives the higher classification accuracy and stability than the other methods.
Keywords/Search Tags:Gene Expression Data, Microarray, Selective Ensemble, Particle Swarm Optimization, Estimation of Distribution Algorithms, Multi-feature
PDF Full Text Request
Related items