Font Size: a A A

Tumor Classification Based On Gene Expression Studies

Posted on:2013-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:S M WangFull Text:PDF
GTID:2218330371460199Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In recent decades, while tumor has threatened seriously to the health of human, the biological information technology has also developed rapidly.The research of gene expression data will reveal the mechanism of tumor which can help tumor diagnosis and specific treatment.Currently, tumor classification with gene expression data has been studied in two aspects: First, feature selection is crucial to gene expression data which has very high dimensionality with a small number of samples. Second, the performance of classification is required to be improved for applying in tumor diagnosis.The achievements of this dissertation are as follows:(1) Feature selection:with the research of three general feature selection methods, the result shows that SVM-RFE can obtain the optimal feature subset from four data sets. Thus a new method based on SVM-RFE has developped, which is combined with re-sampling technique, can solve the problem of imbalanced class distribution. Compared with another method FAST, the new method can achieve better.(2) Classification:five methods of classification have been applied with selected gene expression data, in which SVM is most suitable for the four gene expression data.(3) Ensemble learning:SVM has been utilized with Bagging and Boosting, but the result is not as good as expected. So an improved method that is disturbed by parameter for increasing the diversity between member classifiers has been studied, and it has achieved the result as expected.(4) Cost sensitive learning:because of the characteristics in tumor classification, two methods of cost sensitive learning named MetaCost and AdaCost have been introduced, both of them take C4.5 as member classifiers and don't have to change the inner of C4.5, and AdaCost do better than MetaCost in four data sets. Considering the good performance of SVM, SVM has been put forward as member classifiers for MetaCost and AdaCost, and it's proved to be effective.
Keywords/Search Tags:gene expression data, feature selection, ensemble learning, cost sensitive, support vector machine
PDF Full Text Request
Related items