Font Size: a A A

Research On Prediction Method Of RNA Secondary Structure With Pseudoknots Based On Attention Mechanism

Posted on:2022-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y L WangFull Text:PDF
GTID:2480306332465434Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Accurate RNA secondary structure information is the cornerstone of gene function research and RNA tertiary structure prediction.RNA is an important basic substance in organism.It plays an important role in regulating and expressing genes.The function of RNA in biology mainly depends on its tertiary structure.However,the tertiary structure of RNA is very complex,and there is no effective description method.It is very difficult to predict the tertiary structure directly from the primary structure of RNA molecule.Therefore,using the primary structure of RNA to predict the RNA secondary structure becomes the main process of RNA structure research.At the same time,the study of RNA secondary structure with pseudoknots has always been a difficulty in the field of RNA research.Although the structure can be obtained through physical and chemical experiments,because of the obvious shortcomings of these methods,it is still necessary to use the relevant computer knowledge to predict the RNA secondary structure.However,most traditional RNA secondary structure prediction algorithms are based on the dynamic programming(DP)algorithm,according to the minimum free energy theory,with both hard and soft constraints.The accuracy is particularly dependent on the accuracy of soft constraints(from experimental data like chemical and enzyme detection).With the elongation of the RNA sequence,the time complexity of DP-based algorithms will increase geometrically,as a result,they are not good at coping with relatively long sequences.Furthermore,due to the complexity of the pseudoknots structure,the secondary structure prediction method,based on traditional algorithms,has great defects which cannot predict the secondary structure with pseudoknots well.Therefore,few algorithms have been available for pseudoknots prediction in the past.Deep learning method is a representation learning method proposed in recent years.It can mine effective hidden features in data through training a large number of data.The ATTfold algorithm proposed in this article is a deep learning algorithm based on an attention mechanism.It analyzes the global information of the RNA sequence via the characteristics of the attention mechanism,focuses on the correlation between paired bases,and solves the problem of long sequence prediction.Moreover,this algorithm also extracts the effective multi-dimensional features from a great number of RNA sequences and structure information,by combining the exclusive hard constraints of RNA secondary structure.Hence,it accurately determines the pairing position of each base,and obtains the real and effective RNA secondary structure.Due to the special processing of RNA sequence data and structure data in this paper,ATTfold can also predict the real secondary structure of RNA containing pseudoknots through this processing method.Finally,after training the ATTfold algorithm model through tens of thousands of RNA sequences and their real secondary structures,this algorithm was compared with four classic RNA secondary structure prediction algorithms.Our algorithm has significant improvement in sensitivity,specificity and F1-score.Compared with the other four algorithms,the highest F1 score of our method is 22.8% higher in the short RNA sequence family,and 23.9% higher in the long RNA sequence family.At the same time,our evaluation index calculation can truly show the secondary structure of RNA,not just the improvement of accuracy.As the data in RNA sequence databases increase,our deep learning-based algorithm will have superior performance.In the future,this kind of algorithm will be more indispensable.
Keywords/Search Tags:RNA Secondary structure prediction, Pseudoknots, Attention mechanism, Deep learning, Hard constraints
PDF Full Text Request
Related items