Font Size: a A A

Mining Method Based On Gene Expression Profiling Data

Posted on:2008-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:L J YiFull Text:PDF
GTID:2208360215486640Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The research on cancer based on gene expression profile has been an important research area of bioinformatics. In this paper, data mining technologies are applied to this area, concentrating on feature set selection and cancer sub-type classification, and some novel approaches are proposed.A neural network method for cancer sub-type classification with regression is proposed to categorize acute leukemia data. The original data are processed by an updated SNR index presented herein, and several feature gene set candidates are selected by using regression method with wavelet. Building the classifier by using neural network, the optimal feature set with five genes are decided and the classification is completed with accuracy of 91%. The proposed method and the feature set are verified by means of decision tree method, and empirical result is 86%.Cancer subtype classification and feature set selection based on GSNR index are also proposed. By combining data mining method and SNR index, the irrelevant genes are eliminated firstly. Then the classifier is build by using neural network and the feature genes are selected with searching approaches and independent test. We apply this novel method in the subtype classification of acute leukaemia, and decide a feature set with 8 feature genes. The accuracy of classification is 97%。The empirical results proved that GSNR index is robust and extensible.Finally an all-round index called GB index, which combines Gini index and Bhattacharyya distance, is proposed to eliminate the irrelevant genes. A classifier is constructed based on SVM. Then the optimum feature subset is selected from the feature genes with Backward Selection Search Method algorithm and independent tests. We apply this novel method in the subtype classification of SRBCT, and decide a more compact set with 7 feature genes. The accuracy of classification is 100% with SVM classifier. We also tested the feature genes with other classifiers such as ANN and CBA, and the experimental results prove well. Compared with typical approaches, this subset, which provides valuable references for the diagnosis and curing of SRBCT, has reduced the number of feature genes.
Keywords/Search Tags:gene expression profile, feature selection, neural networks, support vector machine, Gini index
PDF Full Text Request
Related items