Font Size: a A A

Gene Clustering Algorithm Based On Data Dimensionality Reduction Framework

Posted on:2022-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:H C GaoFull Text:PDF
GTID:2480306557464234Subject:Information networks
Abstract/Summary:PDF Full Text Request
With the rapid development of biological observation technology and gene chip technology,the gene expression data obtained after gene measurement of biological individuals has a dramatic growth.As one of the main analysis techniques,clustering is widely used in the study of gene common function,interaction and co regulation.However,due to the continuous expansion of data space,the problem of "dimension disaster" has even appeared.The processing of high-dimensional data sets has become a new difficulty in the field of data analysis.On the one hand: redundant information is very easy to exist in high-dimensional data,which will affect the efficiency of the algorithm and the accuracy of subsequent processing.On the other hand: the design of a large part of classical algorithms has lagged far behind the development of data dimension,unable to cope with the current increasingly complex problem of data space expansion.This paper starts from the clustering analysis of gene expression data sets,in order to improve the efficiency and accuracy of the algorithm in the application of high-dimensional data sets.Based on the combination of data dimension reduction framework and clustering algorithm,the problems of poor quality,lack of diversity,low efficiency and unclear biological significance of existing clustering algorithms for high-dimensional data set analysis are solved.The main work of this paper includes:(1)This paper proposes a gene clustering algorithm model based on principal component analysis.Aiming at the linear problem of gene expression data set,the principal component analysis algorithm is used for feature extraction,which can achieve the dimensionality reduction and decorrelation of data space.The self-organizing map clustering algorithm is used to solve the clustering problem and improve the clustering efficiency in high-dimensional space.(2)A gene clustering algorithm model based on artificial neural network is proposed.Aiming at the nonlinear problem of gene expression data set,the excellent neural network structure is used to learn and express the features in high-dimensional data space.Combined with the variational autoencoder algorithm,the linear dimension reduction is extended to the nonlinear dimension reduction field,and the clustering analysis of high-dimensional gene expression data is realized.(3)An accurate gene clustering algorithm model based on artificial neural network is proposed.Aiming at the problems of slow speed and low accuracy of gene clustering algorithm based on artificial neural network,K-means algorithm is introduced for secondary accurate clustering to improve the efficiency of the algorithm.At the same time,this method is compared with CC,QUBIC,FLOC,BIMAX and other classical clustering algorithms.
Keywords/Search Tags:Gene expression data, Cluster analysis, Principal component analysis, Variational Autoencoder, Self-organizing Mapping, K-means
PDF Full Text Request
Related items