Font Size: a A A

Machine Learning-based Subcellular Localization Of Long Non-coding Rna And Lnc Rna-disease Associations Identification

Posted on:2022-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:M J ChenFull Text:PDF
GTID:2504306554471054Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Long non-coding RNAs(lnc RNAs)play important roles in many biological regulatory processes.On the one hand,different subcellular localization patterns of lnc RNAs enable them to perform different functions.Identification of subcellular location of lnc RNAs is conducive to determining the function of lnc RNAs.On the other hand,mutations and disorders of lnc RNAs affect the development of many human diseases.Inferring lnc RNAdisease associations helps to reveal the molecular mechanisms of diseases and explore treatment strategies.However,biological experiments to determine the subcellular location of lnc RNAs and their associations with diseases are costly and time-consuming,and the known information is far from meeting the needs of modern medical research.Thus,the development of more efficient computational methods with the help of machine learning is an important auxiliary method to the study of the subcellular localization of lnc RNAs and their associations with disease.This paper aims to propose an effective method for prediction of the subcellular localization of lnc RNAs and their associations with diseases.The specific work is as follows:(1)To predict the subcellular location of lnc RNAs,this paper developed a method based on machine learning and a variety of sequence feature information(called lnc Loc Pred).lnc Loc Pred first extracted the sample sequence features,including K-mer nucleotide composition,pseudo-dinucleotide composition,and local structure-sequence triple element.Then,Variance Threshold,binomial distribution,and F-score were used to obtain representative features.Finally,logistic regression model was used to predict the subcellular locations of lnc RNAs.The experimental results showed that the top-ranked k-mers have a higher base content of G and C in the form of short repeats.Improving prediction accuracy on four subcellular localizations,lnc Loc Pred achieved the highest overall accuracy of 92.37%on the benchmark dataset by leave-one-out cross-validation,higher than the existing stateof-the-art predictors.Additionally,lnc Loc Pred achieved higher overall accuracy than other predictors in the independent test set collected in this paper.(2)For the prediction of lnc RNA-disease associations,this paper proposed a novel prediction model(GRCFLDA in short)based on the graph convolutional matrix completion.To learn more efficient embedding of nodes,GCRFLDA adds conditional random field and attention mechanism into the encoder layer.GCRFLDA used Gaussian kernel interaction similarity and cosine similarity as the side information of lnc RNA nodes and disease nodes.Since only lnc RNA-disease associations information is required by GCRFLDA,it improved generality.Cross-validation on four datasets showed that GCRFLDA achieved better AUC values than other existing methods.In addition,case studies on six diseases illustrated that70 out of 80 lnc RNA-associations were confirmed by recent biomedical literature.Results suggest that GCRFLDA can be used as an effective tool to predict potential lnc RNA-disease associations.The machine learning-based methods proposed in this paper to predict the subcellular localization of lnc RNAs and their associations with diseases both achieved good performance in experimental results,which has played a supplementary role in the development of bioinformatics.
Keywords/Search Tags:lncRNA subcellular localization, lncRNA-disease association, machine learning, graph convolution matrix completion model
PDF Full Text Request
Related items