Research On Clustering Methods Based On Multi Omics Data

Posted on:2021-02-09

Degree:Master

Type:Thesis

Country:China

Candidate:Z B Gao

Full Text:PDF

GTID:2428330626960366

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the completion of the human genome project,the massive multi omics data generated along with the continuous development of measurement technology provide a new point of view for us to understand life action.Researches on multi omics data not only benefit from the increasement of the sources of evidence,but also overcome the shortcoming of single omics analysis of precise description of delicate and complex life activitiesClustering is a key technology of data mining.Using multi omics data for clustering analysis has a very important practical significance in disease classification,precision medicine,drug research and so on.Precise definition of the similarity between samples can greatly improve the performance of clustering algorithms.This thesis studies the clustering technology based on multi omics data from two different perspectives: similarity measurement and fusion.The main work is as follows:A new ensemble clustering method based on metric learning is proposed.Firstly,the clustering results with high reliability are obtained by integrating clustering on multi omics data.Then,based on these results,distance metric learning is carried out on each single omics data to optimize the distance metric representation between samples.Finally,the optimized distance metrics are used to cluster the multi omics data again to get the final clustering results.A similarity belief fusion method based on evidence theory is proposed.Firstly,the similarities obtained from different omics data are transformed into the degree of belief in similarity,and then these beliefs from different data sources are fused by evidence theory to get the similarity matrix considering multi omics information.Finally,a spectral clustering algorithm is applied to the fusion similarity matrix in order to get the final clustering results.Experimental results on public datasets show that the two methods proposed can achieve more clinically significant clustering results than the existing methods.The analysis of cancer cases shows that the clinical indexes of identified subtypes by the two methods are clearly distinguished.The two methods proposed in this thesis focus on the measurement and fusion of similarity separately.The first method can be classified as a late integration method while the second method is developed as a mid-term integration method.MMEC method achieves better results compared with the existing methods,while SBF method performs better on a variety of cancer datasets.And the computational complexity of SBF is far lower than that of MMEC method.Experimental results display the effectiveness of clustering algorithms based on multi omics data.

Keywords/Search Tags:

Multi Omics Data, Clustering Algorithm, Evidence Theory, Metric Learning

PDF Full Text Request

Related items

1	Research Of Data Fusion Algorithm Based On Clustering D-S Evidence Theory
2	Research Multi-source Evidence-based Analysis Of Evidence Theory
3	Research On Face Recognition Algorithm Based On Metric Learning
4	Research On Conflict Evidence Analysis Based On DS Evidence Theory
5	Research On Clustering Algorithms Based On Metric Learning For Complex Data
6	Research On Prognostic Risk Assessment Algorithm Based On Multi-omics Data Analysis And Deep Learning
7	Determination Of Optimal Clustering Number Of Mixed Data And Its Application
8	Evidence Theory Based On Rough Set And Its Application In Network Evidence Fusion
9	The Data Fusion Method Research Based On D-S Evidence Theory
10	Research On Metric Learning Algorithm Based On Multi-label Data