Font Size: a A A

Research On RNA Molecular Secondary Structure Prediction Based On Machine Learning

Posted on:2022-06-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:K K MaoFull Text:PDF
GTID:1520306818454644Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
More and more studies have shown that RNA(ribonucleic acid)not only transmits genetic information and participates in protein synthesis,but also participates in many important biological processes.Although we have now understood the functions of some RNA molecules,there are still a large number of RNA molecules with unknown functions.The function of RNA molecule depends not only on its sequence but also on its threedimensional structure.To understand RNA function in depth,it is necessary to determine their three-dimensional structure.To determine the structural of RNA,the experiments mainly use X-ray crystallography,nuclear magnetic resonance(NMR)and cyro-electron microscopy techniques.However,due to the instability of RNA,the number of RNA threedimensional structures directly determined through experiments is very limited.In order to make full use of massive sequence information and bridge the huge gap between the number of known sequence and the number of structures,more and more researchers have adopted computer simulation and calculation methods to assist structure prediction.However,whether it is RNA secondary structure or tertiary structure prediction accuracy,there is still a lot of room for improvement,and the prediction method needs further development.This article is mainly about the research of RNA secondary structure prediction methods.On the problem of RNA secondary structure prediction,the existing methods mainly have two different directions:1.The method based on the minimum free energy;2.The comparative sequence analysis method.In addition,there are some methods that combine the above methods.However,the current prediction accuracy of these traditional methods is not very high for long-chain RNA molecules,and they are even more powerless for those molecules with pseudoknots in their structures.In recent years,with the development and progress of methods such as machine learning and deep neural network,more and more researchers have applied these methods to various fields and achieved good results.Deep learning is good at discovering complex structures in high-dimensional data.It is widely used in images,speech recognition,predicting the activity of potential drug molecules,analyzing particle accelerator data,predicting the impact of non-coding DNA mutations on gene expression and disease,cancer detection,etc.This thesis mainly studies the application of deep learning methods in RNA secondary structure prediction.The main work is as follows:(1)A deep learning method 2dRNA which couples the LSTM model and the U-net model is proposed to predict the secondary structure of RNA molecules.This method only needs to input the sequence of the target RNA,and does not use other feature information or homologous family multiple sequence alignment information.The test results on the Archiveā…” test set with 234 RNAs show that the prediction accuracy of 2dRNA is significantly higher than the currently widely used methods.The average precision PPV and sensitivity STY are both above 0.9,and it can perform well for long-chain RNA molecules.Also,in the prediction of pseudoknots,our method can accurately predict more pseudoknots than other methods.(2)On the basis of the above mentioned coupled deep learning method 2dRNA,combined with transfer learning,the length-dependent deep learning method 2dRNA-LD for predicting RNA secondary structure prediction is proposed,which further improves the accuracy of RNA secondary structure prediction.We first used a larger data set bpRNA to train our model.The training data reached 10814 RNA,covering various types.In order to learn better from these data,we used the grid searching method to search for hyperparameters of our model.We trained a total of 320 different models for all the combination of the number of neural network layers,learning rate and other parameters,and selected the best 5 models.Subsequently,we divided the training data into different length intervals according to the length of the data and performed transfer learning to obtain a length-dependent model 2dRNA-LD.Tests on bpRNA test set show that 2dRNA-LD can further improve the prediction accuracy of 2dRNA,and the average MCC can reach 0.687.(3)Although there are many methods for RNA secondary structure prediction,there is no way to explore the folding path of RNA secondary structure.From this perspective,we designed a set of deep reinforcement learning algorithms 2dRNA-fold to study this problem 2dRNA-fold selects residue pairing step by step according to the given RNA sequence until the final secondary structure is formed.In the process of learning and training,we also built a reinforcement learning gym environment RNAWorld,which includes secondary structure and tertiary structure.We have built a corresponding website for the above methods:http://biophy.hust.edu.cn/new/2dRNA.In addition,we also apply the direct coupling analysis algorithm which used in protein and RNA structure prediction to neural networks.By calculating the correlation between nodes in different network layers,we add cross-layer connections to these highly correlated nodes.Compared with the original slow training process,it can significantly speed up training.
Keywords/Search Tags:Non-coding RNA, RNA secondary structure prediction, RNA folding path, Machine learning, Neural networks, Deep learning, Reinforcement learning, Direct coupling analysis
PDF Full Text Request
Related items