Font Size: a A A

Analysis Of Gene Expression Data Based On Spectral Clustering

Posted on:2017-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:J C GuFull Text:PDF
GTID:2370330488479879Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The incidence and mortality rate of cancer is very high,and the survival of patients is short.Cancer has brought the serious adverse effects to patient himself,family,and even the whole society.Therefore,the prevention and treatment of cancer is the focus of life science researchers all over the world.Through the clustering of samples,we can not only research and predict the unknown cluster samples according to the clustering of tumor samples,help doctors in the diagnosis and treatment of tumor,but also can find out the related driving genes or functional expression similar genes,explore the regulatory relationships between genes,and find out the value of the gene,select drug target,target diagnosis.The clustering analysis of gene expression data is of great importance to the prevention and treatment of cancer.Due to the high dimensional characteristics of gene expression data,the data often become sparse,the gap distance between the samples becomes no longer apparent and redundant features will increase,so the effectiveness of traditional clustering algorithms is greatly reduced.By cluster of gene expression data in order to improve the accuracy of the diagnosis of cancer,it has become a hot research topic in bioinformatics and medical field.In this paper,we mainly focus on the spectral clustering of tumor gene expression data:(1)Sparse representation based spectral clustering(SRSC)maps each high dimensional sample into a low dimensional coefficient vector subspace,and constructs similarity matrix for spectral clustering.Due to the low efficiency of the method in the high dimensional gene expression data clustering.To solve this problem,a spectral clustering algorithm based on Collaborative representation is proposed(CRSC):first collaborative representation of high dimensional gene expression data reduces the dimensionality,and ensures the integrity of the information;then the Cosine distance is used to construct similarity matrix;finally,the spectral clustering algorithm is used to cluster the similarity matrix.Through the comparison of various evaluation criteria,it shows that the algorithm has strong robustness in the time complexity and clustering accuracy.(2)The SRSC algorithm is tie consuming in solving sparse coefficient when the number of samples is very large.In order to solve this problem,combined with the traditional principal component analysis,a principal component analysis based spectral clustering(PCASC)is proposed:first using principal component analysis to reduce the dimensionality of gene expression data;then the Cosine distance is used to construct similarity matrix;finally,the spectral clustering algorithm is used to cluster the similarity matrix.Through the comparative analysis,the algorithm is superior to SRSC in both accuracy and operation rate,and is more suitable for the analysis of large scale gene expression profile data.
Keywords/Search Tags:Gene expression profile data, Spectral clustering, Sparse representation, Collaborative representation, Principal Component Analysis
PDF Full Text Request
Related items