Font Size: a A A

Sample Class Discovery And Sample Class Prediction Based On Gene Expression Profile

Posted on:2004-12-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:W J LiFull Text:PDF
GTID:1100360092996782Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Based on gene chip technology, the expression level of tens of thousands of genes can be observed simultaneously in some particular life processes. Molecular biologist will investigate the life phenomenon and its nature from genomic level in systematic and global way. But, what we can get from the gene chip experiments directly is just a matrix of gene expression. All applications from the experiments are realized by bioinformatics mining on gene expression matrix. The core methods in bioinformatics are sample class discovery and sample class prediction. By studying the related algorithms in detail, we developed the following systems Samcluster and Tclass for above two aspects respectively.In Samcluster system, the following cluster methods including hierarchical cluster analysis. K-means, and self-organizing map (SOM) and the feature selection methods based on coefficient of variation (CV) and simple T-test were integrated. To evaluate the performance of the Samcluster system, the Samcluster was applied to four expression datasets COLON, LEUKEMIA72. LEUKEMIA38, and OVARIAN. The results show that there are only 5, 1,0, and 0 samples misclassified, respectively. We conclude that the proposed scheme. SamCluster. is an efficient method for automatic discovery of sample classes using gene expression profile.In Tclass system, we combine the Fisher linear discriminant analysis and the following feature selection methods such as all possible combination of features, stepwise optimization, and Monte-Carlo simulation. Furthermore, the concept of stability analysis was introduced in this system. To evaluate the performance of Tclass system, the proposed method was applied to the COLON data set. The results demonstrated that using only a subset of genes ranging from 3 to 10 can achieve high classification accuracy.Furthermore, the LEUKEMIA 16 data set was analyzed using Samcluster and Tclass systems. The results indicated that the two types of samples were classified perfectly. Therefore, our works play an important role in the following aspects such as prediction of gene function, tumor subtype based on gene expression profile, identification of new sample subtype, and identification of drug target.
Keywords/Search Tags:Gene chip, Gene Expression Profile, Bioinformatics, Sample Subtype, Sample Classification
PDF Full Text Request
Related items