Font Size: a A A

Cost Sensitive Learning Optimization Method For Gene Expression Data

Posted on:2019-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2404330551460010Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Using machine learning method to classify gene expression data can diagnose diseases effectively,such as cancer,which has great significance to the improvement of human health.Decision tree algorithm and related integration algorithms are widely used due to its easy to understand and simple structure.However,due to the unbalance distribution of gene expression data,the algorithm of decision tree need to be improved in the classification of gene expression data.The cost sensitive algorithm can effectively compensate for the limitation of classification accuracy by considering the cost factor.At the same time,the cost sensitive algorithm,due to lack of proper evaluation criteria and reasonable parameter determination methods,also needs to be further improved.To solve the above problem,this paper carried out the research in the following:(1)We propose a cost-sensitive rotation forest algorithm for gene expression data classification.Three classification costs,namely misclassification cost,test cost and rejection cost,are embedded into the rotation forest algorithm.This extension of the rotation forest algorithm is named as cost-sensitive rotation forest algorithm.Experimental results show that the cost-sensitive rotation forest algorithms effectively reduce the classification cost and make the classification result more reliable.(2)Here,we proposes a method of classification accuracy calculation for cost sensitive algorithms.Balance accuracy is utilized instead of overall accuracy to effectively assess the performance of cost sensitive algorithms.Compared with overall accuracy,the proposed balance accuracy will not neglect the contribution of samples in small classes.In the experiment,we classify gene expression data with cost sensitive extreme learning machine,and the result shows the balance accuracy is a valid criterion for evaluating the classification performance.(3)This work utilizes balance accuracy as the evaluation standard,obtains the classification accuracy under different weights settings by adaptive algorithm,andeventually obtains the optimal cost-weight function with highest classification accuracy through 3d fitting.In the experiment,we classify gene expression data with cost-weight parameters obtained by cost-weight function;and the results show that the proposed algorithm is widely applicable to various imbalanced datasets.Through the above research,the problem of parameter determination about cost sensitive algorithm has been solved.According to the characteristics of the sample,we can adjust cost weight and improve classification performance effectively.
Keywords/Search Tags:gene expression data, rotation forest, extreme learning machine, cost sensitive
PDF Full Text Request
Related items