Font Size: a A A

Research On The Prediction Method Of RNA Secondary Structure With Pseudoknot Based On Deep Learning

Posted on:2022-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:H Y BaiFull Text:PDF
GTID:2480306761459754Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
RNA is key to gene expression and is responsible for a variety of catalytic and regulatory mechanisms in the cell.An important step in understanding the function of RNA is to determine its structure,so secondary structure analysis is significant in that it can help determine the function of RNA molecules as well as many regulatory,catalytic and constitutive processes in the cell.In addition to determining the function of the RNA molecule,the secondary structure of RNA provides information about the structural domains of the molecule and the location of important sites within the structure.RNA structure prediction has two important roles.Firstly,it helps to explain experiments related to RNA function and secondly,it helps to set up new experiments to explore function.Computational methods have become an integral part of the prediction process and are now the dominant approach.Due to the limitations of traditional techniques for measuring RNA secondary structure,computational methods for RNA secondary structure prediction have been the primary source of understanding of RNA structure and the corresponding potential function of most RNAs for many years.One computational approach is typically based on thermodynamic models that calculate the free energy of RNA secondary structure.Another approach is based on comparative sequence analysis,which predicts RNA secondary structure by borrowing information from homologous RNA sequences.In addition,with the large amount of RNA data identified,researchers have introduced machine learning,especially deep learning techniques for secondary structure prediction.Hybrid research methods avoid the limitations of single methods,greatly improve computational efficiency,and can more comprehensively analyse the structural information of multiple classes of RNAs to obtain more accurate results.The TCMfold method proposed in this paper also belongs to the hybrid research method,which mainly combines the deep learning-based model and the constitutive hard constraints of RNA secondary structure.The method is divided into two parts: a prediction unit and a correction unit.The prediction unit uses the encoder part of the transformer model and a convolutional neural network to learn sequence features,followed by feature decoding using a convolutional neural network and a multilayer perceptron.The network model of the prediction unit outputs a two-dimensional matrix of base-pairing scores,which is corrected for constraints using the correction unit.The correction process effectively hard constrains the RNA secondary structure into a mathematically unconstrained problem,and the calculation yields a symmetric pairing matrix of base sequences.In this paper,the model is trained and evaluated using the dataset RNAStralign,and tested and compared on the same test set using four other commonly used algorithms.According to the final score results,on two types of RNAs,5s RNA and t RNA,TCMfold's results were more than 9 percentage points higher than the next highest scoring algorithm.For tm RNA and telomerase,the TCMfold results were more than 21 percentage points higher than the next highest scoring algorithm.Although the predictions for these two classes of long sequence RNAs are not as good as for short sequences,the model is still highly competitive.At present,the number of sequences in each of the known RNA families and the length distribution of the sequences vary widely,and most of the prediction scores for long sequences do not reach 90%,but with the increase in RNA data,it is foreseeable that as more RNA molecules from each family are collected in the dataset,and after the volume of data increases substantially,the model in this paper will definitely be able to learn richer sequence features from all of them and better prediction results will be obtained.
Keywords/Search Tags:RNA secondary structure prediction, pseudoknot, CNN, transformer, hard constraints
PDF Full Text Request
Related items