Font Size: a A A

Classification Based On Wavelet Transform And CART Algorithm For Microarray Data

Posted on:2012-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2178330335478376Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In this paper, we researched the classification of microarray data based on CART (Classification and Regression Tree). The gene chip technology is a revolutionary tool in the early diagnosis of cancer, it diagnoses the cancer based on the difference between cancer samples and no-cancer samples in expression of gene.The microarray data have some features, such as high dimensionality, small samples, which give a big challenge to the classification and recognition methods. In pattern recognition problems, high dimensional data is needed for feature selection or feature extraction for reduce the dimension to improve efficiency and accuracy of classification.Generally speaking, there are many approaches to reduce the dimension of microarray. In this paper, the wavelet feature extraction method is used the microarray data. In order to find the key gene information, we reconstruct wavelet details in original microarray data space based on the kth detail coefficients. Wilcoxon rank sum test method is used to select the best features, and good performance is achieved. CART is selected as a classifier for feature attributes; then according to CART algorithm, Gini's diversity index is used as the error function; through 10-fold cross validation, the selected feature attributes are divided into train-samples and test-samples, and train-samples are used to fit an extension classification tree, and test-samples are used to test the classification tree. The optimal size of the tree is found by pruning to perfectly adapt to the new samples.Experiments are carried out on three datasets and experimental results show that the best accuracy can be 99.45%, 92.65%, and 98.61% for lung cancer data, prostate cancer data and leukemia data using our method, and very stable based on 10-fold cross validation in ten experiments; also discovery of significant rules are understand easily and key genes information are very significant for classfication.
Keywords/Search Tags:microarray data, wavelet analysis, Wilcoxon rank sum test, CART algorithm, significant gene information
PDF Full Text Request
Related items