Font Size: a A A

Research On Classification Of Gene Expression Data Based On Laplace Spectra Theory

Posted on:2011-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZhuangFull Text:PDF
GTID:2120360305473157Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Classification of gene expression data is an important way to find the relationship between the different genes. Although the field of pattern recognition algorithms have been significant developed in these years, but it still has many problems must be solved in clustering of gene expression data. Because of the two characteristics (high dimension and low sample) of gene expression data, traditional machine learning methods can not get desired results, and its high computational complexity greatly reduces the efficiency of data analysis.The theory of graphs spectra is introduced into the classification of gene expression data. We utilize this theory to extract the feature of gene expression data and propose some algorithms for classification of gene expression data. This dissertation's main research contents and the achievements are as follows:1. DNA microarray technology has brought a far-reaching impact on the biomedical field, and it is very significant for using classification method to analyze tumor gene expression data. This dissertation proposes an algorithm for obtaining informative genes of tumor gene expression data by utilizing entropy as an indicator. The whole process is done by first putting tumor gene expression data into strata and calculating the entropy of each individual cancer genes. Then, several genes with the highest entropy were selected and classified using SVM. The effectiveness of this algorithm has been proven by leaving-one method and group method.2. We introduce a novel classification algorithm for gene expression data based on the Laplacian spectra of graphs. Firstly, the class center is obtained by computing the average of each class in the training set, and the Laplacian matrices of complete graphs so called normal graphs are constructed on some samples with the minimum Euclidean distance between the class center. Then, the sum of matched points is calculated by replacing points of standard image with test samples. Finally, the test sample is divided into the biggest one of the total matched points of the class.3. This dissertation proposes an algorithm for classification of gene expression data based on Fiedler Vector. Firstly, the Laplacian matrix of complete graph is constructed on all the different types of gene expression data. Then, the Fiedler Vector is obtained by the singular value decomposition of this Laplacian matrix. Finally, the samples are divided into two classes by utilizing the signs of the Fiedler Vector components.
Keywords/Search Tags:Classification, Gene Expression Data, Entropy, Laplace Spectrum, Fiedler Vector
PDF Full Text Request
Related items