Font Size: a A A

Tumor Gene Chip Data Clustering Analysis Algorithm

Posted on:2009-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:X Z KongFull Text:PDF
GTID:2208360242999450Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
The exponential growth of the cumulative biological information has attracted a number of scientists to be engaged on the study of bioinformatics which has become the focus of world's attention. The tumor diagnosis method based on gene expression profiles will be developed into the fast and effective method in clinical domain in the near future. Although DNA microarray experiments provide us with huge amount of gene expression data, only a few of genes relate to tumor among the gene expression profiles. Moreover, it is a challenging task to extract feature or select informative genes related to tumor from gene expression profiles because of its characteristics such as the high dimensionality, the small sample set and many noises and redundancy in gene expression profiles. Therefore, the molecular diagnosis of tumor has been broadly and deeply investigated and a large number of papers related to this problem are published.However, the accurate classification of tumor by selecting the tumor-related genes from thousands of genes is a difficulty task due to the large number of redundant genes, and usually it is impossible to apply an exhaustive algorithm to search infonnative gene subset in such large gene space. And then choosing an appropriate clustering algorithm and classifier is very important.In this thesis, we proposed the tumor informative gene selection method, introduced the techniques and methods in tumor clustering and classification process model, described the key procedure of the process model. And then we compared the corresponding clustering and classification accuracy rate of our proposed method with the result of other methods.The main study works of this thesis are described as follows: Firstly, we performed tumor clutering on the selected infonnative genes. We employed independent component analysis (ICA) to select a subset of genes, and then the unsupervised methods NMF and its extensions, sparse NMF (SNMF) and NMF with sparseness constraint (NMFSC) are used for tumor clustering on the subset of genes selected by ICA. We applied the proposed method on three DNA microarray data sets and showed the method is efficient and feasible. Secondly, based on the extracted eigengenes through ICA, the most discriminant eigengenes are selected using sequential floating forward selection(SFFS) technique, and then support vector machine(SVM) is used to classify the modeled data. The experimental results of applying this method on three DNA microarray data sets show that the method is efficient.Finally, the works in this thesis are briefly summarized and reviewed, and further research works are also discussed and proposed.
Keywords/Search Tags:Gene selection, Clustering, Tumor classification, Independent component analysis, Non-negative matrix factorization, Suport vector machine
PDF Full Text Request
Related items