Font Size: a A A

Research Of Isoform Function Prediction Algorithms Based On Multi-omics Data

Posted on:2022-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y HuangFull Text:PDF
GTID:2480306530498154Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Alternative splicing enables a gene spliced into different isoforms,which are translated into multiple protein variants.Predicting individual functions of these isoforms helps deciphering the functional diversity of proteins.At present,much efforts have been made for protein function prediction based on gene level,which assign all functions of gene products to the same gene.In practice,a gene can manifest diverse functions mainly caused by the isoforms alternatively spliced from the same gene,these isoforms and their translated proteins are actual performers for diverse biological functions.Compared with typical gene function prediction,much fewer efforts have been moved toward computational isoform function prediction,mainly due to the lack of sufficient functional annotations of isoforms.More studies point out that alternative splicing is closely related to a variety of developmental abnormalities.Isoform-disease association prediction on the basis of isoform function prediction helps to uncover the underlying pathology of various complex diseases and explore precise treatments and effective drugs for diverse diseases.The main challenge of isoform-disease association prediction is that isoform-disease association data is scanty,and available isoform-disease associations are mostly studied by wet-lab based methods,which are still limited by a low coverage,but high costs.With the rapid advance of high-throughput RNA-Seq technology,an unprecedented amount transcript-level data can be easily collected.Some researchers have proposed isoform function prediction methods by using RNA-Seq data,gene functional annotations and gene-isoform associations relationship.In summary,existing isoform function prediction and isoform-disease association prediction algorithms have two issues to be solved:(1)The lack of sufficient functional annotations and disease associations data at the isoform level.Existing databases record functional annotations and disease associations data at the gene level;(2)The lack of effective integration of multi-omics data,which includes genomic,transcriptome,and proteomic data.At present,the data integration in current methods is still relatively limited.To face the difficulties in current isoform function prediction and isoform-disease association prediction and to improve the prediction accuracy of isoform function and isoform-disease association,we fuse multi-omics data and adopt multi-instance multi-label learning framework.We design joint matrix factorization models and solution methods for the isoform function prediction and isoform-disease association prediction and propose two effective algorithms.On the whole,the major contributions of the thesis include:(1)To address the problem of ignoring the important tissue specificity of alternative splicing,we propose a tissue specificity based isoform function prediction(TS-Isofun).TS-Isofun firstly constructs tissue-specific isoform functional association networks using multiple RNA-Seq datasets from tissue-wise.Next,TS-Isofun models the tissue specificity by selectively integrating them with adaptive weights.It then introduces a joint matrix factorization-based data fusion model to leverage the integrated network,gene-level data and functional annotations of genes to predict the functions of isoforms.Experimental results on the human RNA-Seq dataset show that TS-Isofun significantly outperforms state-of-the-art methods and the account of tissue specificity contributes to more accurate isoform function prediction.(2)Based on the research of isoform function prediction,we propose a computational approach called IDAPred to predict isoform-disease associations and fuse the data of genome,transcriptome and proteome.IDAPred takes a gene as a bag and isoform as instance,and maps gene-disease associations to isoform-disease associations by the framework of multi-instance learning.IDAPred assumes that the available gene-disease associations data is incomplete and introduces a regularization term to complete the gene-disease associations.In addition,IDAPred induces the linear classifier to predict isoform-disease associations.IDAPred improves the prediction accuracy of isoform-disease associations and is significantly better than the compared methods.However,IDAPred integrates multiple isoform-isoform associations networks with equal weights and does not consider the internal associations in diseases.To overcome the above shortcomings of IDAPred,we further propose an isoform-disease association prediction method by multi-omics data fusion(Iso DA).Compared with IDAPred,Iso DA firstly processes and adopts a larger human dataset with more genes,isoforms and diseases,and fuses more omics data.Secondly,Iso DA intergrates multiple isoform-isoform associations networks based on isoform expression and sequence data with adaptive weights.Thirdly,Iso DA constructs disease-disease associations network,which is updated with the optimization.Therefore,the precision of Iso DA can be improved.Finally,the case study in Iso DA validates the isoform-disease associations in APOE and VEGFA,and acheives better prediction performance.
Keywords/Search Tags:Alternative splicing, Function prediction, Disease association prediction, Multi-omics data, Joint matrix factorization
PDF Full Text Request
Related items