Research And Application Of Double Clustering Algorithm For Single Cell Sequencing Dat

Posted on:2024-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:H M Chu

Full Text:PDF

GTID:2530306923988649

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Single-cell sequencing data harbor a wealth of information of research significance.The discovery of this information means that people have a deep understanding and cognition of cell development and diseases progression.At the same time,it also has a tremendous effect on research and treatment of human health and diseases.Finding sets of genes that exhibit similar reactions under certain conditions is a huge challenge for current clustering algorithms due to the fact that a gene may participate in multiple external conditions and produce different reactions.To address this issue,a special clustering algorithm has been proposed that can cluster both rows and columns simultaneously.This algorithm is known as the biclustering algorithm.However,with the emergence of massive datasets,some requirements also have been put forward for the biclustering algorithms: 1)How to improve the generalization ability of biclustering to make it more suitable for processing large datasets? 2)How to improve the robustness of the biclustering to reduce its sensitivity? For resolving above two issues,this paper mainly proposed three biclustering algorithms based on the previous research,and designs a corresponding biclustering toolbox.Specifically,the proposed algorithms are compared with state-of-the-art biclustering algorithms using massive experiments on simulation and real datasets.The results demonstrate that the performance,robustness,and generalization ability of the proposed biclustering algorithms are improved.The contents of the research are mainly as follows:(1)The Joint CC Algorithm and Bimax Algorithm(JCB)is proposed.This algorithm mainly addresses the problem that the Bimax algorithm could not retain the original good biclusters in a dataset.The JCB algorithm utilizes the Mean Square Residue function(MSR)proposed by the CC algorithm to modify the Bimax algorithm procedure for randomly selecting seed.Then,it clusters genes with similar responses under the same condition to obtain a maximum output of a similar subset,resulting in a bicluster.(2)The Adjacency Difference Matrix Binary Biclustering Algorithm(AMBB)is proposed.The AMBB algorithm is specifically design to deal with binary data matrices.First,the data matrix is converted into a binary data matrix,and the differences between genes are calculated according to the expression value of each gene under different conditions.Then,a difference matrix is constructed.Through,the sub-matrix is obtained iteratively based on the difference value in the difference matrix.The genes and conditions corresponding to the subscripts of the rows and columns of the matrix from a bicluster.The performance of the algorithm is improved by continuously iterating the intimately related gene-sample data of the sub-matrix.(3)By introducing weights into the AMBB algorithm,a Weighted Adjacency Difference Matrix Binary Biclustering Algorithm(W-AMBB)is obtained.The data matrix is converted to a binary matrix using traditional preprocessing methods,but this could result in some information loss.Therefore,a new preprocessing method is proposed to translate the data matrix.The purpose of this study is to propose a formula for calculating the difference value using weight,and to calculate the difference matrix between genes according to the formula.By comparing with other preprocessing methods,it is verified that the new preprocessing-method has excellent performance in terms of saving data information.Meanwhile,the W-AMBB algorithm obtains excellent results by biclustering the significant information in the sub-matrix.(4)A toolbox of biclustering algorithms called Bi SEAT is proposed.A deep investigation of biclustering algorithms reveals that the comparison between these algorithms is difficult for individuals who are new to the biclustering algorithm.Additionally,it was found that the metric used to evaluate the performance of the biclustering algorithm was not rigorous enough for overlapping biclusters.Therefore,a new evaluation method was proposed and incorporated into Bi SEAT.The toolbox also contains seven classic biclustering algorithms,four methods for intraevaluation and two methods for biological enrichment analysis and generating simulated datasets.

Keywords/Search Tags:

Biclustering Algorithm, Single cell sequence data, Binary Matrix, Enrichment Analysis

PDF Full Text Request

Related items

1	Research On Biclustering Algorithm And Its Application In Gene Expression Data Analysis
2	Clustering Analysis Of Gene Expression Profile Data Based On Meta-heuristic Algorithms
3	Integration Analysis Of Single Cell And Spatial Transcriptome Data Based On G-S Mapping Algorithm
4	Research On Novel Biclustering Analysis Method For MiRNA-targeted Gene Data Based On Parallel Graph Autoencoder
5	Research On Biclustering Methods Based On Intelligent Optimization Algorithm
6	Research On Multi_Objective Optimization Algorithm For Biclustering In Microarry Gene Expression Data
7	Study On Key Gene Discovery Technology And Analysis Tools By Transcriptome Data
8	Single Cell RNA-seq Clustering Method Based On Self-renewal Of Cell Relationship Matrix
9	A Research On Biclustering Algorithm And Its Application In The Gene Expression Data
10	Single-cell RNA-seq Data Preprocessing Algorithm Based On LOESS Regression Weighting