Font Size: a A A

Prediction Of Plant LncRNA-encoded Short Peptides Combined Logical Reasoning With Capsule Network

Posted on:2022-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:H H HuFull Text:PDF
GTID:2480306776963889Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Long non-coding RNA(lncRNA)is a type of non-coding RNA with a length of 200 nt that has no ability to code for protein.However,recent studies have shown that some lncRNAs contain short open reading frames(sORFs)of no more than 300 nt,which can encode short peptides.Plant lncRNA-encoded short peptides plays an irreplaceable role in the growth and development of plants,and it has important application value for improving plant quality and yield in agriculture and forestry production.Therefore,research on short peptides encoded by plant lncRNA has gradually come into the public view.The current methods for identifying short peptides are mainly divided into two types:biological experiment methods and calculation methods.Biological experimental methods are not suitable for large-scale identification because of their high cost and long experimental period.Most of the calculation methods are machine learning models trained on human and animal data.On the one hand,compared with human and animal data,plant data are less.On the other hand,there are some differences between animal and plant short peptides,which makes it difficult for the existing tools to directly predict plant short peptides.Therefore,adaptive mining of plant short peptides faces great challenges.Because there are few plant lncRNA-encoded short peptides verified by experiments,bioinformatics softwares were used to mine the sequence of sORFs in plant lncRNA.In order to improve the reliability of data,the data set is further screened based on the idea of logical reasoning.In order to solve the problem that the existing tools are difficult to be directly applied to the prediction of plant short peptides,a prediction model of plant lncRNA-encoded short peptides based on feature engineering was constructed.Aiming at the problem that machine learning methods involved too much manual intervention and it was difficult to improve the performance of the model when new features are difficult to obtain,a multi-scale convolutional neural network(CNN)and capsule network(Caps Net)model of plant lncRNA-encoded short peptides prediction(MConv MCaps)was proposed.It used multi-scale CNN to extract various types of primary features,so as to enrich feature diversity,and used multi-scale capsule network to extract advanced features and automatically perform feature clustering,in order to obtain key features for accurate classification and prediction.First,the sequences were encoded by p-nts method so as to preserve the correspondence between codons and amino acids.Then,the multi-scale convolution kernel was used to extract the theme features of different lengths instead of the single convolution kernel.At the same time,the multi-scale capsule network was used to replace the single capsule for better feature integration.In order to verify the advantages of the proposed model MConv MCaps,the experiment achived good classification results compared with traditional machine learning models,single deep learning models and simple fusion deep learning models on the datasets of Physcomitrella patens,which verifyed the rationality and efficiency of the model.In addition,the datasets of Arabidopsis thaliana and Glycine max were used for independent testing to verify the good generalization ability of the model.Compared with existing tools,the validated datasets of lncRNA-sORFs were used to verify the superiority of the model.
Keywords/Search Tags:lncRNA, sORFs, short peptides, logical reasoning, capsule network, prediction
PDF Full Text Request
Related items