Font Size: a A A

Research On Clustering Methods Based On Multi Omics Data

Posted on:2021-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z B GaoFull Text:PDF
GTID:2428330626960366Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the completion of the human genome project,the massive multi omics data generated along with the continuous development of measurement technology provide a new point of view for us to understand life action.Researches on multi omics data not only benefit from the increasement of the sources of evidence,but also overcome the shortcoming of single omics analysis of precise description of delicate and complex life activitiesClustering is a key technology of data mining.Using multi omics data for clustering analysis has a very important practical significance in disease classification,precision medicine,drug research and so on.Precise definition of the similarity between samples can greatly improve the performance of clustering algorithms.This thesis studies the clustering technology based on multi omics data from two different perspectives: similarity measurement and fusion.The main work is as follows:A new ensemble clustering method based on metric learning is proposed.Firstly,the clustering results with high reliability are obtained by integrating clustering on multi omics data.Then,based on these results,distance metric learning is carried out on each single omics data to optimize the distance metric representation between samples.Finally,the optimized distance metrics are used to cluster the multi omics data again to get the final clustering results.A similarity belief fusion method based on evidence theory is proposed.Firstly,the similarities obtained from different omics data are transformed into the degree of belief in similarity,and then these beliefs from different data sources are fused by evidence theory to get the similarity matrix considering multi omics information.Finally,a spectral clustering algorithm is applied to the fusion similarity matrix in order to get the final clustering results.Experimental results on public datasets show that the two methods proposed can achieve more clinically significant clustering results than the existing methods.The analysis of cancer cases shows that the clinical indexes of identified subtypes by the two methods are clearly distinguished.The two methods proposed in this thesis focus on the measurement and fusion of similarity separately.The first method can be classified as a late integration method while the second method is developed as a mid-term integration method.MMEC method achieves better results compared with the existing methods,while SBF method performs better on a variety of cancer datasets.And the computational complexity of SBF is far lower than that of MMEC method.Experimental results display the effectiveness of clustering algorithms based on multi omics data.
Keywords/Search Tags:Multi Omics Data, Clustering Algorithm, Evidence Theory, Metric Learning
PDF Full Text Request
Related items