Font Size: a A A

The Research On Gene Classification Algorithms Based On MicroArray

Posted on:2008-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:H J WangFull Text:PDF
GTID:2178360215479826Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the revolutionary explosion of biology-related information, how to process the massive biological information is presented. Meanwhile, computer can deal with massive biological information because of its revolutionary developments. So, bioinformatics is developed rapidly and successfully, which is based on the research of computational biology and the computer processing of biological information. Recently, computer and network are developed greatly, followed with the rapid growth of various biological databases. Bioinformatics aims to organize these data, and extract new knowledge of biology from the data.The advent of MicroArray technology provides a powerful means for Bioinformatics. Using Emerging Patterns, gene classification algorithm can identify the sample's cancer type and discover the hidden gene patterns related to the cancer by analyzing MicroArray. Then the cancer pathology is revealed. The research on gene classification algorithm is the topic of this thesis and is summarized as follows:(1) The general developments of gene classification and related MicroArray technology are introduced. Then, some experiments and analyses are made to compare the performances of some typical categorization algorithms, lay basic theoretical and experimental supports for the research in the following chapters.(2) To overcome the shortcoming in the estimation of probability using frequency with small samples in the discovery of EPs, Bayes estimation is introduced into the discovery of EPs, which relieves the instability of Entropy by adding virtual samples.(3) Aiming at improving the PCL (Prediction by Collective Likelihood) classifier, we propose a gene classification algorithm based on EPs. In addition to introducing Bayes estimation, the frequency of EPs from training dataset is added into the calculation of the likelihood in PCL. The experiment is taken on the human acute leukemia dataset and the results show the algorithm improves the performance.(4) RCP (Random Cut Point) is proposed to strengthen the generalization about unknown test samples. Combined with Bayes estimation, the EPAs (Emerging Patterns Advanced) are acquired. Then an EPA based classifier is constructed inspired by KNN. By analyzing selection criterion of the feature genes and corresponding cut points based on Entropy and minimum description length principle (MDLP), boundary point is introduced to narrow the scope of candidate cut points, so the savings of computation are achieved significantly. But cut points in EPs are still the midpoints between each successive pair of samples in the sorted sequence, and these points are not necessarily the best cut point for unknown test samples. RCP is random variable which is subjected to uniform distribution. Based on the experimental results, RCP and EPA-KNN gene classifier are feasible and effective.
Keywords/Search Tags:MicroArray, gene expression profiles, gene classification, emerging patterns, Bayes estimation, random cut point
PDF Full Text Request
Related items