| Single-cell sequencing data harbor a wealth of information of research significance.The discovery of this information means that people have a deep understanding and cognition of cell development and diseases progression.At the same time,it also has a tremendous effect on research and treatment of human health and diseases.Finding sets of genes that exhibit similar reactions under certain conditions is a huge challenge for current clustering algorithms due to the fact that a gene may participate in multiple external conditions and produce different reactions.To address this issue,a special clustering algorithm has been proposed that can cluster both rows and columns simultaneously.This algorithm is known as the biclustering algorithm.However,with the emergence of massive datasets,some requirements also have been put forward for the biclustering algorithms: 1)How to improve the generalization ability of biclustering to make it more suitable for processing large datasets? 2)How to improve the robustness of the biclustering to reduce its sensitivity? For resolving above two issues,this paper mainly proposed three biclustering algorithms based on the previous research,and designs a corresponding biclustering toolbox.Specifically,the proposed algorithms are compared with state-of-the-art biclustering algorithms using massive experiments on simulation and real datasets.The results demonstrate that the performance,robustness,and generalization ability of the proposed biclustering algorithms are improved.The contents of the research are mainly as follows:(1)The Joint CC Algorithm and Bimax Algorithm(JCB)is proposed.This algorithm mainly addresses the problem that the Bimax algorithm could not retain the original good biclusters in a dataset.The JCB algorithm utilizes the Mean Square Residue function(MSR)proposed by the CC algorithm to modify the Bimax algorithm procedure for randomly selecting seed.Then,it clusters genes with similar responses under the same condition to obtain a maximum output of a similar subset,resulting in a bicluster.(2)The Adjacency Difference Matrix Binary Biclustering Algorithm(AMBB)is proposed.The AMBB algorithm is specifically design to deal with binary data matrices.First,the data matrix is converted into a binary data matrix,and the differences between genes are calculated according to the expression value of each gene under different conditions.Then,a difference matrix is constructed.Through,the sub-matrix is obtained iteratively based on the difference value in the difference matrix.The genes and conditions corresponding to the subscripts of the rows and columns of the matrix from a bicluster.The performance of the algorithm is improved by continuously iterating the intimately related gene-sample data of the sub-matrix.(3)By introducing weights into the AMBB algorithm,a Weighted Adjacency Difference Matrix Binary Biclustering Algorithm(W-AMBB)is obtained.The data matrix is converted to a binary matrix using traditional preprocessing methods,but this could result in some information loss.Therefore,a new preprocessing method is proposed to translate the data matrix.The purpose of this study is to propose a formula for calculating the difference value using weight,and to calculate the difference matrix between genes according to the formula.By comparing with other preprocessing methods,it is verified that the new preprocessing-method has excellent performance in terms of saving data information.Meanwhile,the W-AMBB algorithm obtains excellent results by biclustering the significant information in the sub-matrix.(4)A toolbox of biclustering algorithms called Bi SEAT is proposed.A deep investigation of biclustering algorithms reveals that the comparison between these algorithms is difficult for individuals who are new to the biclustering algorithm.Additionally,it was found that the metric used to evaluate the performance of the biclustering algorithm was not rigorous enough for overlapping biclusters.Therefore,a new evaluation method was proposed and incorporated into Bi SEAT.The toolbox also contains seven classic biclustering algorithms,four methods for intraevaluation and two methods for biological enrichment analysis and generating simulated datasets. |