Font Size: a A A

A Data Integration Method Based On NMF And Its Application In Cancer Module Mining

Posted on:2021-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2480306050965879Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput sequencing technology,large-scale and multi-type omics data are constantly updated.In the field of life science,a large number of interrelated multi-omics data,such as gene expression data and micro RNA expression data,have been accumulated.Cancer,as a complex disease with strong heterogeneity,has different biological mechanisms for the occurrence and development of different subtypes of cancer.Different types and different omics data of cancer provide us with a variety of perspectives,which is helpful for us to conduct a comprehensive study on the complex biological processes of cancer occurrence and development.Therefore,how to integrate the data of various types and omics of cancer,dig out the biomolecular modules and biological processes related to subtypes,and provide guidance for accurate medicine,is the challenge that computational biology now faces.Non-negative matrix factorization(NMF)method decomposes nonnegative input matrix into the form of matrix multiplication,the input matrix decomposition after all the components are not negative,can achieve about reducing dimension,because the data of non negative and accumulation,the method can provide explanation for structured data,so it is effectively from the whole information of excavated a method of local information.Its variants have been widely used in many fields such as pattern recognition,signal processing,bioinformatics,etc.Non-negative matrix factorization has been extended to simultaneous decomposition of multiple input matrices and is an effective model for integrating biological data.Cancer omics data,multi-type,high noise,the author of this paper,a part of the sample classification information known etc,in order to excavate potential of omics data structure,reveal the cancer subtype related biological molecules module and a biological process,on the basis of the NMF model to join weight constraint item to measure the weight of different input matrix in decomposition,to join the supervision and constraint term remain known sample classification information,gives a kind of weighted a semi-supervised joint nonnegative matrix decomposition model,and the iterative process is given.Firstly,it is verified on the simulation data,and the decomposition effect of adding constraint and not adding constraint is compared,which proves that the weight term constraint and supervision term constraint proposed in this paper are meaningful.And then in a real application on multiple omics data,cancer respectively in TCGA multiple omics data of breast cancer and glioma single cell sequencing of multiple sets of learning to run the algorithm in the data,digging subtype specific molecules module and a biological process,through the function of enrichment and document verification,to prove our mining module is with biological significance,helps to reveal the cancer subtypes biological mechanism of the occurrence and development.In addition,we also used the decomposed sample lowdimensional matrix to predict the sample tags of unknown subtypes and used clinical information to prove their effectiveness.We also used the molecules in the module as features to conduct single-cell clustering and proved that the effect of single-cell clustering could be improved.The above experiments demonstrate that the proposed algorithm is a powerful tool for extracting subtype specific patterns with significant biological significance from cancer omics data.
Keywords/Search Tags:Non-negative Matrix Factorization, Omics Data, Data Integration, Cancer Module
PDF Full Text Request
Related items