Font Size: a A A

Research On The Sample Classification Based On Gene Expression Data

Posted on:2009-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:L B JiangFull Text:PDF
GTID:2178360242490846Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The gene chip technology can simultaneously analyse the expression data of thousands of genes, and it has generated a large quantity of available data. Using the machine learning to carry on the analysis for these massive complex data is one of important research areas at present. In these research areas, sample classification based on gene expression data is acting a very important role, it generally has two pivotal steps: Gene selection and Classifier design. This article, which based on research of sample classification's process, improved some methods in view of the particularity of sample classification based on gene expression data and the shortcoming of the existent methods.The most major characteristic of the gene expression data matrix is that a few samples (generally does not exceed 100) correspond to a great many attributes (several thousand even up to ten thousand genes), which brings a huge challenge to the research. In order to reject the genes which are independent of the sample classification to reduce the redundancy or computation complexity and enhance classification accuracy, gene selection is an absolutely necessary step before sample classification. At first, this article used the relevant coefficient standard to filter the genes, which was advantageous to reduce the redundancy and the hunting zone of the optimized algorithm at the same time; Then it used the ant colon optimization strategy in the result to select the subset of genes, it took the clustering effect as the optimization objective function, which guaranteed the accuracy of classification and greatly reduced the computation complexity of the method synchronously. The experimental result indicated that the gene selection method proposed in the article can select the subset of related genes in very short time.The sample classification based on gene expression data pertains to classification in data mining, its most important performance index is the accuracy. This article used the probability statistics theory to design a classification system on the foundation of attribute recognition theory. At the same time, in order to overcome the defect of lower accuracy of single classifier, this article unified this classification system and the traditional KNN classifier to form a new method of sample classification. The result of experiments on cancer data demonstrated that the new method has a good classification effect and low time complexity.
Keywords/Search Tags:sample classification, gene selection, ant colon optimization algorithm, attribute recognition theory, accuracy
PDF Full Text Request
Related items