Font Size: a A A

Ensemble Learning Based Gene Selection And Sample Classification

Posted on:2015-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:S Y WeiFull Text:PDF
GTID:2298330467485816Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The gene chip technology provides a new method for disease diagnosis, treatment and the development of new drugs. Meanwhile, lots of disease-associated gene expression datasets are generated. It makes people understand the pathogenesis of cancer from the molecular level by analyzing and processing this kind of datasets, and inspired great interest.Ensemble learning has been widely applied to many areas of machine learning, including gene expression data. Comparing with single model, lots of processors are used to solve one problem in the ensemble learning model, and it provides more robust and accurate classification. When ensemble learning methods are used to analyze the disease-associated gene expression datasets, the test samples will be accurately classified. Therefore, employing ensemble learning method to analyze cancer-associated gene expression datasets is the main contents of our paper.Ranking aggregation method can provides more robust and accurate gene subset, but it may ignore some genes which have high scores in single rank and the obtained gene subset may contain any redundant genes. To solve these problems, affinity propagation clustering is applied, as it can select representative and unrelated genes from the primaries gene subset which contains the genes with high score in single rank. Experimental results on seven gene expression datasets show that the proposed method can select more robust gene subset with stronger distinguish ability for samples and better classification effect.As only one gene subset is selected for classification in gene expression data analysis, it may result in loss of information. Draw the ideas of ensemble feature selection method, we proposed a new ensemble learning method, which based on the genes ranking, selection and grouping. Firstly, many gene subsets are produced by randomly selecting a gene from ench gene group and combining them. Secondly, basic classifiers are trained in feature subspace corresponding to gene subsets. Finally, the results of test data using these basic classifiers can be integrated by majority vote. Experimental results on seven gene expression data sets show that the proposed method has low classification error, stable performance and excellent scalability.
Keywords/Search Tags:Ensemble Learning, Classification, Gene Microarray, Affinity propagationClustering
PDF Full Text Request
Related items