Font Size: a A A

Study On Classification Of Gene Expression Data Based On Extreme Learning Machine

Posted on:2015-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:C L AnFull Text:PDF
GTID:2298330431489259Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of classification techniques, extreme learningmachine (ELM) has been increasingly used in gene expression data classification dueto its fast classification speed and high accuracy. However, classification methodusing single ELM is not stable. Integrated learning method can overcome theunstable problem of data classification associated with single ELM method, and isbecoming a powerful tool for obtaining biological information through the analysisof gene expression data. When the costs of misclassification for the given samplesare not equal, it is not the only purpose to just improve the accuracy for theclassification of gene expression data, the cost of misclassification should also beminimized in additional to the high classification accuracy. The current thesis will befocused on the study of gene expression data classification, with the followingsections:(1) We propose the dissimilarity ensemble of ELM algorithm (D-ELM). Basedon two dissimilarity measures, we propose the dissimilarity ensemble of ELM basedon disagreement measure algorithm (D-D-ELM) and the dissimilarity ensemble ofELM based on double-fault measure algorithm (DF-D-ELM) respectively. Firstly,we introduce the dissimilarity measure (including the disagreement measure, anddouble fault measure), and establish the elimination rules. We judge the dissimilarityof each ELM and remove the redundant ELM based on the elimination rules. Therest ELMs are grouped into an ensemble classifier by the strategy of majority voting.Finally, we use the ensemble system to classify the gene expression data.Experimental results show that our proposed D-ELM algorithm can improve theclassification accuracy, and DF-D-ELM algorithm is better than D-D-ELMalgorithm.(2) We propose the Cost-Sensitive ELM algorithm (CS-ELM). Firstly, we takethe probability estimation into the classification process and combine the probabilityof classification and misclassification cost. We further embed the rejection cost torealize the cost-sensitive classification of ELM. Experimental results show that our proposed CS-ELM algorithm can achieve the goal of minimum misclassification costby improving the classification accuracy of small category samples with highermisclassification cost.(3) We propose the dissimilarity based ensemble of ELM with the cost-sensitivefor gene expression data classification (CS-D-ELM). Firstly, this algorithm uses themethod of D-ELM to select the ELM and eliminate the redundant ELM. It is furthercombined with the method of CS-ELM algorithm to embed the cost-sensitive toachieve the goal of minimum misclassification cost based on high classificationaccuracy. Experimental results show that our proposed CS-D-ELM algorithm canimprove not only the classification accuracy but also the minimum themisclassification cost.
Keywords/Search Tags:extreme learning machine, ensemble algorithm, cost sensitive learning, gene expression data, classification
PDF Full Text Request
Related items