Font Size: a A A

Research On Classification Of Gene Expression Data Based On Adjacency Matrix Decomposition

Posted on:2012-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2218330338970469Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In the 21st century, science and technology is highly developed, people's living standard is greatly improved, more and more people are eager to learn their own origin process, exploring the secret of human's life. With the continuous development of modern biological technology, bioinformatics get a breakthrough in recent years, research on gene chip technology is becoming mature, acquire gene expression data is becoming more and more easily and more and more accurate. As the DNA sequence has been constantly published and analysis, the veil of gene is uncovered. With the deep study on tumor gene expression profiles, can help people understand mechanisms of tumor development, discover new subtype of disease, identify cancer early diagnostic marker and therapeutic target, improve the diagnosis accuracy on complex disease, and enhance the effectiveness of clinical treatment. However, due to the characteristics of high dimensions and small size of gene expression data, it has far beyond the handle scope of traditional analysis method, the existing methods can not meet the practical needs, how to treatment, mining, analyze, and interpret gene expression data effectively has became the bottleneck of bioinformatics. Therefore, the researchers has gradually transition gene expression data analysis from the traditional statistical methods to machine learning methods, this has become the hotspot in bioinformatics research these years.This thesis is based on bioinformatics and spectrum diagram theory, using pattern recognition technology and computer science, the features that will reflected the graph structures is introduced into the classification of gene expression data to study the feature extraction and the classification of tumor gene expression, the data and the result were well analysis, and the performance of the algorithm were demonstrated. The main research contents:1. Introduced the basic knowledge of gene expression data, summarized the classification analysis methods of gene expression data in recent years, based on the characteristics of gene expression data analysis, discussed the background, status, significance, existing problems and future direction of classification research.2. Proposed a cancer subtype feature extraction and classification method based on the adjacency matrix decomposition, Using Gaussian right on tumor gene expression data to tectonic adjacency matrix first, then singular value decomposition, put decomposed eigenvector as classification feature into support vector machines to classification and recognition. The experiment by using leave one out cross validation on two leukemia subtypes achieved good results.3. Combine the principal component analysis method, using Gaussian right tectonic adjacency matrix on gene expression data of the sample points, make the sample points has the information of space structure, singular value decomposition, using feature scoring criteria to find out the main component that can utmost distinguished tumor samples and normal samples, put this component as sample characteristics into the KNN classifier. Through the experiment on leukemia and colon cancer expression data proved that this method is feasible and effective.
Keywords/Search Tags:Bioinformatics, Gene Expression Data, Classification, Adjacency Graph, Principal Component Analysis
PDF Full Text Request
Related items