Gene Clustering Algorithm Based On Data Dimensionality Reduction Framework

Posted on:2022-10-10

Degree:Master

Type:Thesis

Country:China

Candidate:H C Gao

Full Text:PDF

GTID:2480306557464234

Subject:Information networks

Abstract/Summary:

PDF Full Text Request

With the rapid development of biological observation technology and gene chip technology,the gene expression data obtained after gene measurement of biological individuals has a dramatic growth.As one of the main analysis techniques,clustering is widely used in the study of gene common function,interaction and co regulation.However,due to the continuous expansion of data space,the problem of "dimension disaster" has even appeared.The processing of high-dimensional data sets has become a new difficulty in the field of data analysis.On the one hand: redundant information is very easy to exist in high-dimensional data,which will affect the efficiency of the algorithm and the accuracy of subsequent processing.On the other hand: the design of a large part of classical algorithms has lagged far behind the development of data dimension,unable to cope with the current increasingly complex problem of data space expansion.This paper starts from the clustering analysis of gene expression data sets,in order to improve the efficiency and accuracy of the algorithm in the application of high-dimensional data sets.Based on the combination of data dimension reduction framework and clustering algorithm,the problems of poor quality,lack of diversity,low efficiency and unclear biological significance of existing clustering algorithms for high-dimensional data set analysis are solved.The main work of this paper includes:(1)This paper proposes a gene clustering algorithm model based on principal component analysis.Aiming at the linear problem of gene expression data set,the principal component analysis algorithm is used for feature extraction,which can achieve the dimensionality reduction and decorrelation of data space.The self-organizing map clustering algorithm is used to solve the clustering problem and improve the clustering efficiency in high-dimensional space.(2)A gene clustering algorithm model based on artificial neural network is proposed.Aiming at the nonlinear problem of gene expression data set,the excellent neural network structure is used to learn and express the features in high-dimensional data space.Combined with the variational autoencoder algorithm,the linear dimension reduction is extended to the nonlinear dimension reduction field,and the clustering analysis of high-dimensional gene expression data is realized.(3)An accurate gene clustering algorithm model based on artificial neural network is proposed.Aiming at the problems of slow speed and low accuracy of gene clustering algorithm based on artificial neural network,K-means algorithm is introduced for secondary accurate clustering to improve the efficiency of the algorithm.At the same time,this method is compared with CC,QUBIC,FLOC,BIMAX and other classical clustering algorithms.

Keywords/Search Tags:

Gene expression data, Cluster analysis, Principal component analysis, Variational Autoencoder, Self-organizing Mapping, K-means

PDF Full Text Request

Related items

1	A Cluster Analysis Method For Gene Expression Data Based On Stable And Sparse Principal Components
2	Research On Principal Component Analysis Method And Its Application In Cancer Omics Data
3	A Two-stage Coverage Of The Clustering Algorithm And Its Application
4	The Research And Realizing Of IGA-FCM Clustering Algorithm In Gene Expression Data Analysis
5	Application Of Spatial Weighting And Higher-Order Principal Component Analysis In Multivariate Geoscience Information Synthesis
6	Weighted Principal Component Score Clustering Analysis Based On Functional Data And Its Application
7	Research And Visualization Of Dimension Reduction Method Of Gene Expression Data Based On Principal Component Analysis
8	Research On Robust Matrix Factorization Method And Its Application In Gene Expression Data
9	Robust Principal Component Analysis With Incomplete Data
10	Improved Principal Component Analysis In Chinese Universities Ranked In The Application Of Mathematics