Font Size: a A A

Research And Implementation Of PiRNA Recognition Algorithm Based On Deep Learning

Posted on:2021-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:F Z WeiFull Text:PDF
GTID:2370330602499776Subject:Agricultural engineering and information technology
Abstract/Summary:PDF Full Text Request
Noncoding RNA(ncRNA)is a kind of functional RNA that transcribes from DNA but does not encode proteins.It represents one of the most exciting fields in biomedical research.According to the research of transcriptome and bioinformatics,thousands of ncRNA are classified into different categories according to their functions and lengths,including tRNA,rRNA,miRNA,siRNA,piRNA and lncrna,etc.piRNA is a large number of small non coding RNAs widely existing in different species.Compared with miRNA and lncRNA,the amount of data is limited and extensive research has been carried out.The research on piRNA is still in its infancy,mainly focused on the level of transcription and post transcription,and there is little research on the function of piRNA at the level of post translation.Therefore,the accurate recognition of piRNA sequence from noncoding RNA is an important guarantee for the follow-up function research of piRNA.At present,most of the research is based on the extraction of thousands of features by human or tools,and then combined with some machine learning methods for classification and recognition.Because of too many features,the repeatability of the method is not strong,and it can only be used for a small class of piRNA with corresponding features,and the overall accuracy or reliability of piRNA recognition needs to be improved.Therefore,based on the combination of convolutional neural network(CNN)and Bi-directional Long Short-Term Memory(BiLSTM),this paper designs a deep learning network model named DeepiRNA.The model effectively reduces the interference of human to feature extraction and improves the accuracy and reliability of piRNA recognition.In the experiment,piRNA data and non piRNA data of human and mouse species were prepared.After data analysis and preprocessing,the training data set was constructed,and then the data of the two species were trained respectively.The training process uses the method of 5-fold cross validation to save the parameters and weights of the best model ofhuman and mouse respectively.Experiments show that the DeepiRNA has high accuracy and generalization ability when it is used to recognize piRNA sequences.In the test data of human,the accuracy rate is about 92.86%,AUC value is 0.9805;in the test data of mouse,the accuracy rate is about 92.22%,AUC value is 0.9751,both the accuracy rate and generalization ability,DeepiRNA has better performance,it shows that the model has potential application in piRNA recognition.In addition,a piece of piRNA data of Aedes aegypti published in the journal Nature in April 2020 was predicted and analyzed by using DeepiRNA.The results showed that the high predicted value reached more than 0.99,indicating that DeepiRNA model also has potential advantages in cross species piRNA recognition.Furthermore,different models were used to compare the experimental results.Three machine learning algorithms,support vector machine(SVM),random forest(RF)and XGBoost,were selected to train with the same training data sets of human and mouse.The results show that DeepiRNA is superior to the other three methods in terms of accuracy,F1 score and AUC,and has high accuracy and reliability.Finally,in order to facilitate the online use of relevant biological researchers,according to the best preserved models of human and mouse species,we designed and developed an online website system carrying DeepiRNA algorithm.First,the user selects the species to be predicted online and submits a single sequence or a file(multiple sequences)to the server according to the data sample.Then the system will call the prediction model according to the species matching model selected by the user and the submitted data to achieve real-time feedback prediction results,which will be displayed on the page.The website address is :Http://www.deepbiology.cn/DeepiRNA.
Keywords/Search Tags:piRNA, CNN, BiLSTM, Sequence Recognition
PDF Full Text Request
Related items