Research And Application Of Spectral Clustering In Analysis Of Gene Expression Data

Posted on:2011-03-30

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Deng

Full Text:PDF

GTID:2120360308958946

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Gene chip technology has achieved great development and been widly applied in biology fields, but it generates a large number of gene expression data. How to analyze these massive data has become a new problem to molecular biologist, so bioinformatics, as a rapidly emerging discipline, has developed into a frontier area of research. Gene expression data reflects the abundance of mRNA generated in transcription process in cells from microarray experiment. By analyzing these data, we can obtain the function and the control information of genes. Research on gene expression data has become an active cross-subject of life sciences, mathematics and computer science, as well as one of the hotspot in the bioinformatics.Clustering technology is an important method to analyze the massive data. By clustering, the similar expression genes can be divided into the same cluster, so we can infer unknown gene`s function through known functions of genes in the same cluster.The thesis mainly researches on the clustering used to analyze gene expression data, and the works are listed as follows:â‘ Cluster analysis algorithms which are usually adopted to analyze gene expression data depend too much on the shape of the data distribution, and the results converge at local optimum. So in this thesis we try to use the spectral clustering to analyze gene expression data. Spectral clustering is a novel algorithm based on the vector of data matrix, and is also an algorithm that can classify graph according the weight between the vertices in the graph. This algorithm does not depend on the shape of data distribution, and it can converge at global optimum.â‘¡As the spectral clustering can not automatically find the best number of clusters, so it needs to iteratively compute eigenvalues and eigenvectors, consequently, it costs fairly much time. In this thesis we design a method called VP to automatically find the number of clusters in spectral clustering algorithm. This method can reduce the time complexity, so it is quite necessary for large gene expression data analysis.â‘¢Based on the high dimensionality but small sample size of gene expression data and combined with the knowledge of the biological fields, we propose to raise the weight of certain samples to get more accurate clustering results.â‘£Focusing on the purpose of gene expression data clustering analysis, we propose a method called ARI to calculate the accuracy of clustering result. And then we adopt ARI as an external standard and the classical adjust-Fom as an internal standard to evaluate and analyze the result of different clustering algorithms.â‘¤We design a serial of simulative experiments for the research works mentioned above. The results show: 1)Spectral clustering algorithm can make a better result for any shape of data distribution; 2) Spectral clustering algorithm performs better for gene expression data than hierarchical clustering algorithm and Kmeans; 3) VP method can find the best clustering number automatically; 4) The results of clustering are more accurate after raising the weight of certain samples.â‘¥We find the relationship between the parameterÎ¸and the parameterÏƒin each dataset used in this thesis and then get the proper ranges ofÎ¸according the relationship.

Keywords/Search Tags:

Gene expression data, Bioinformatics, Spectral clustering, Weights of the sample, Clustering accuracy

PDF Full Text Request

Related items

1	Clustering And Classification Techniques In Bioinformatics Applications
2	A Fast Clustering Method For Large Single-cell RNA-seq Data Based On Spectral Clustering
3	Analysis Of Gene Expression Data Based On Spectral Clustering
4	A Feature Selection Algorithm For Biological Data Based On Dynamic Iterative Spectral Clustering
5	Gene Microarray Data Analysis Based On Clustering Algorithms
6	Research On Robust Matrix Factorization Method And Its Application In Gene Expression Data
7	Research On Weighted Two - Way Clustering Algorithm Based On Gene Expression Microarray Datasets
8	Algorithms For Clustering Gene Expression Data Based On Spanning Tree
9	Analysis Of Gene Expression Data Clustering
10	Analysis And Application Of Time-Course Gene Expression Data