Font Size: a A A

Semi-lazy Learning With Applications To Bioinformatics

Posted on:2013-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2248330395975582Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer science in all these years, it is more and morepopular to apply the computer technology to the biological information, such as its storing,retrieving, networking, processing, analyzing, navigating, and visualizing. As a result, abrand-new interdisciplinary field—Bioinformatics—comes into being. By now there arehuge amounts of biological data crying out to be researched in that the Human GenomeProject has already been accomplished and in that the Post-Genome era is coming forth. Inthis situation, how to deal with such informational resources and how to mine out the usefulknowledge from them are both an enormous challenge for us and a knotty problem on us.With the rapid development of bioinformatics, the DNA microarrays technology enablesus to monitor the expressional levels for millions of genes at the same time, and it greatlyincreases the challenges of how to comprehend and how to interpret the resulting mass ofdata. How to use the machine learning methods in analyzing the complex bioinformaticsdata is an important research field at the present time. As for the microarray data of geneexpressions, one of its important applications is to classify the cancer genes, and this hasnaturally become the focus of this thesis. The expressional data of genes, which have beencollected for the cancer classification, are, however, characterized as follows. The number ofsuch genes has, usually, by far exceeded the number of samples. Therefore, it is veryimportant to pick out those meaningful genes which are helpful for the cancer classification.In this thesis, firstly, the method of Partial Least Squares (PLS) is used to reduce the highdimensional predictor space down to a lower dimensional space. Secondly, the method CRNis used to classify the data sets. As is known, CRN (Classifying Categorical Data byRule-based Neighbors) is a non-metric and parameter-free classifier, which may as well beregarded as a hybrid of the rule induction and the instance-based learning. The proposedapproach is tested on two benchmark cancer gene expression datasets, namely, leukemia andcolon datasets. The experimental results show that the classification accuracy rates of theproposed method are competitive to that of other existing methods.
Keywords/Search Tags:DNA microarray, gene expression, cancer classification
PDF Full Text Request
Related items