Font Size: a A A

Prediction Of Long Noncoding RNA Based On Deep Learnin

Posted on:2024-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:B Q CaoFull Text:PDF
GTID:2530307148956829Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Long non-coding RNAs(lnc RNA)are transcripts which have not less than 200 nucleotides and do not encode proteins.lnc RNA play important roles in cellular mechanisms and development of diseases.With the rapid development of highthroughput sequencing technology,more and more transcript sequences have been sequenced.And the efficient identification of lnc RNA is fundamental to the study of their function.Although there are some methods for identifying lnc RNAs,there are problems such as inefficient processing of big data,too much reliance on comparison databases,and poor recognition of transcripts with short open reading frames.In this study,a new method for identifying lnc RNA based on a dual-channel convolutional neural network is proposed,called CPCNN.We first extract MLCDS from transcripts,use one-hot vectors to encode the base sequence of the transcript and the codon sequence of MLCDS,and input them into CPCNN.After passing them through the convolutional layer,pooling layer and fully connected layer,CPCNN outputs two scores,reflecting the probability that the transcript is coding RNA and lnc RNA,respectively,and predicts it as the class with the largest score.CPCNN has two paths,which is different from the previous method based convolutional neural networks.CPCNN integrates base information and codon information to improve prediction performance.By comparing CPCNN with CPPred,Deep CPP,CPC2,FEELnc,and CNCI on Human-Model and Integrated-Model,it is shown that CPCNN outperforms other methods,especially in the identification of transcripts with short open reading frames.In addition,we also tested the identification performance of CPCNN on incomplete transcripts and the robustness of the method to data quantity and quality.
Keywords/Search Tags:lncRNA, Convolution Neural Network, Base coding, Codon coding
PDF Full Text Request
Related items