Font Size: a A A

Classification Of Cancer Subtypes Based On Gene Expression Data

Posted on:2019-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y FanFull Text:PDF
GTID:2404330551961193Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The classification of cancer subtypes based on gene expression data has become one of the hotspots of current research because of the in-depth study of bioinformatics.At the level of molecular biology,using the analysis of gene expression data to provide guidance for early diagnosis of cancer has the extremely vital significance.However,the characteristics of gene expression data,such as:high dimensionality,small sample and unbalanced distribution,also challenge the classification of cancer subtypes.Because there are a lot of redundant gene and noise data in gene expression data,only a small number of genes are related to the expression of cancer.Therefore,how to select the characteristic subset of the massive gene expression data genes that is most related to the cancer classification is one of the key research points of the researchers.In addition,scholars are also committed to finding effective classification methods.The purpose of these two researches is to improve the classification accuracy of cancer subtypes and provide more precise decision support for big data healthcare.In this paper,we mainly use Extreme Learning Machine(ELM)to construct classifier model to predict and classify the gene expression data.Meanwhile,the feature selection method based on the characteristics of gene expression data is mainly studied to improve the classification accuracy of classifiers.The main work is:(1)Aiming at the characteristics of high dimension of gene expression data,a feature selection method based on multi-dimensional mutual information(MMI)is proposed.The principle of multidimensional mutual information feature selection algorithm is to screen out the subset of genes that best express cancer classification.In order to evaluate the performance of multidimensional mutual information method,a typical case of leukemia(Leukemia)and colon cancer(Colon)was selected for case study.The MMI feature selection method is compared with the ReliefF method.The results show that the MMI method has a higher classification accuracy,which proves the validity of the feature selection method proposed in this paper.(2)In view of the imbalance of the samples in the gene expression data,the the boosting method is first applied in the field of cancer classification research.By combining the Adaboost method with the extreme learning machine method,a strong classifier Adaboost-ELM is obtained.The results of classification experiments with leukemia data show that the Adaboost-ELM classifier has a better classification effect in the leukemia data set,which reduces the effect of sample imbalances on the classification effect.
Keywords/Search Tags:gene expression data, feature selection, information gain, extreme machine learning, AdaBoost, classification
PDF Full Text Request
Related items