| In recent years,RNA secondary structure prediction has been an important and difficult issues in the field of RNA research.Although some RNA secondary structures can be gained experimentally,in most cases,efficient and accurate computational methods are still needed to predict RNA secondary structure.Current RNA secondary structure prediction methods are mainly based on the minimum free energy algorithm,which finds the optimal folding state of RNA in vivo using an iterative method to meet the minimum energy or other constraints.However,due to the complexity of biotic environment,a true RNA structure always keeps the balance of biological potential energy status,rather than the optimal folding status that meets the minimum energy.For short sequence RNA its equilibrium energy status for the RNA folding organism is close to the minimum free energy status;therefore,the minimum free energy algorithm for predicting RNA secondary structure has higher accuracy.Nevertheless,in a longer sequence RNA,constantly folding causes its biopotential energy balance to deviate far from the minimum free energy status.This deviation is because of its complex structure and results in a serious decline in the prediction accuracy of its secondary structure with the minimum free energy algorithm.Deep learning is a common representation learning method,which can automatically mine the hidden features of effective classification from the data.Based on deep learning and existing real RNA secondary structure data,this paper proposes a novel RNA secondary structure prediction method CDPfold using convolutional neural network model combined with dynamic programming algorithm.We analyze current experimental RNA sequences and structure data to construct a deep convolutional network model,and then we extract implicit features of an effective classification from large-scale data to predict the pairing probability of each base in an RNA sequence.For the obtained probabilities of RNA sequence base pairing,an enhanced dynamic programming method is applied to obtain the optimal RNAsecondary structure.In this paper,the known RNA structures are encoded,and then the convolutional neural network is used to predict the pairing of each base on the RNA sequence.Finally,the dynamic programming method is used to combine the prediction results to obtain the optimal RNA secondary structure.The experimental results show that CDPfold’s prediction of the three RNA families of 5s RNA,t RNA and srp RNA is about 30% higher than other common RNA secondary structure prediction algorithms.In addition,the performance of the deep learning method is directly related to the amount of data.That is to say,it can be inferred that with the continuous increase of real RNA structural data verified by biological experiments,the prediction accuracy of various RNA families will be continuously improved by applying CDPfold proposed in this paper. |