Font Size: a A A

The Study Of Tumor Classification Methods Based On Microarray Data Analysis

Posted on:2010-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:C G XuFull Text:PDF
GTID:2144360302959640Subject:Bioinformatics
Abstract/Summary:
DNA microarray technology is a new technology, formed by the interdiscipline of physics, electronics and molecular biology. Microarray technology has been widely applied to the study on biological and medical fields. Among its applications, the microarray technology based cancer diagnosis makes it possible to deeply study the cancer pathological mechanism, including the occurring and diffuseness of cancer. In order to achieve reliable diagnosis and prediction on the type of cancers, many researches focus on the identification of key genes to different cancers and the classification of cancers. However, due to the small sample size problem along with high dimensions, the traditional methods can not achieve good performances.So this thesis first reviews some of classic methods in microarray data based tumors classification. Then it describes the study of multiple classifier system which is my major research work during my master period. And the last part is about the application of Genetic Programming algorithm and multiple classifier system in tumor microarray data. The main work for this thesis can be concluded as follows:(1) A GP was proposed based on the idea of splitting multiclass problem into multiple two-class problems. The characteristic of this GP is that each individual consists of a set of small-scale ensemble systems (named as sub-ensemble here), which are used to tackle respective two-class problem. In this way, each individual can solve a multiclass problem directly. And this GP can be used to solve feature selection and classification problem at the same time. Here, a diversity measure is proposed based on the difference among the features in each tree, and a greedy local improvement algorithm is used to maintain the diversity among the sub-ensembles. These measures ensure the high efficiency of the GP.(2) Microarray dataset produced from a single lab always include noisy or biased. It will affect the classification and generalization abilities of classifers that are trained on this dataset. However, if multiple datasets from different labs are collected and used to produce classifiers, some of these classifiers which are to fit for these datasets could be sifted out. And these classifers may reflect the essence of tumors more accruate. Here, sub-ensembles based GP algorithm is applied on multiple datasets from different labs in order to check the effect of this algorithm and produce the classifiers with higher generalization ability.
Keywords/Search Tags:Tumor Classification Problem, Multiple Classifier System, Microarray Datasets, Diversity, Genetic Programming, Base Classifier, Cross datasets
Related items