Font Size: a A A

Subspace Clustering Based On Trace Group Lasso And Its Application To Single-cell RNA-Seq

Posted on:2020-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:X FangFull Text:PDF
GTID:2370330575487853Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the field of bioinformatics,clustering sequenced cell RNA sequences can discover the similarities of cell function,which is useful in the study of the potential biological mechanisms,and can further provide new reference for drug development or disease treatment.In recent years,with the continuous influx of complex and redundant biological data,traditional clustering algorithms such as partitioning clustering,hierarchical clustering,density-based clustering,model-based clustering and grid-based clustering methods are always difficult or unable to achieve satisfactory clustering accuracy.In this context,the subspace clustering algorithms have rapidly become research hotspot of scholars because of their excellent ability to deal with large-scale and high-dimensional data sets.These methods have such advantages as high noise tolerance,strong robustness and good scalability,which have shown great application prospects in face clustering,motion segmentation and handwriting recognition.Nevertheless,it is still under the problems of weak interpretability,poor clustering effect and others when we directly apply the basic subspace clustering model to bioinformatics mining,due to ignoring the structural characteristics of the intrinsic association of biological data.To this end,the Lasso method is embedded in the low rank representation subspace clustering framework,and a subspace clustering algorithm with dual-level expression mechanism is proposed in this paper.In addition,we develop the corresponding fast solving algorithm,and apply our model to the analysis of single-cell RNA-seq in the mouse somatosensory cortex and hippocampus.The main contents and innovations of this paper are summarized as follows:(1)A novel Trace Group Lasso(TGL)method is proposed,which utilizes the hybrid strategy combining the trace Lasso and the group Lasso method to realize the dimensionality reduction process with sparsity of variable angle,preset group sparsity and automatic group sparsity.The results of classification experiments on the UCI classified data sets show that the proposed method is superior to the other two variant methods of Lasso in classification accuracy and gene selection ability.(2)A Subspace Clustering algorithm based on TGL(TGLSC)is bring forward,which adopts the dual-level linear expression mechanism of combining samples and features,so that the clustering scheme can be coordinated in the both samples and features subspace.The clustering experiments are carried out on the clustering data sets of face clustering and motion segmentation and the results show that the proposed algorithm has the overall best performance in many aspects such as accuracy and stability compared with other five subspace clustering algorithms.(3)For the objective function of the proposed TGLSC algorithm,we introduce the most commonly used Alternating Direction Method of Multipliers(ADMM)to solve it distributedly.By analyzing the clustering results of single-cell RNA-seq in the mouse somatosensory cortex and hippocampus,we explored and revealed some biological scientific information and rules.
Keywords/Search Tags:Subspace clustering, Low rank representation, Trace group Lasso, Bioinformatics mining, Single-cell RNA-seq
PDF Full Text Request
Related items