Font Size: a A A

The Study Of SiRNA Silencing Activity Prediction Method Based On Machine Learning Methods

Posted on:2018-12-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y HanFull Text:PDF
GTID:1310330515976115Subject:computer science and Technology
Abstract/Summary:PDF Full Text Request
RNA interference(RNAi)is a cellular process whereby double-stranded RNA(ds RNA)leads to posttranscriptional gene silencing through base-pairing interactions and is found in many eukaryotic systems,including plants,fungi,invertebrates and mammals.In mammalian cells,long ds RNA is processed into short 21–23 nucleotide(nt)ds RNAs known as small interfering RNA(siRNA)and induces instant target m RNA knockdown.In recent years,RNAi has been widely applied to study of gene function,gene therapy and drug development.And siRNA which plays critical role in RNAi has attracted more attentions from researchers.Since the siRNAs targeting different positions of a single m RNA will produce different silencing efficiencies,and most silencing efficiencies are not ideal.Thus how to design active siRNAs to achieve the highest silencing efficiencies has become the most important issue of RNAi.The siRNA design is an important prerequisite for the application of RNAi to gene function and drug development,and has become a hotspot in RNAi study.At present,siRNA design methods are divided into two categories: siRNA design methods based on statistical rules and siRNA design methods based on machine learning algorithm.The results show that the siRNA design methods based on machine learning can more accurately predict the siRNA silencing efficiencies.However,despite much machine-based siRNA design algorithms have been produced,the predictive silencing efficiencies is still unsatisfactory.More potential features in siRNA sequence associated with siRNA silencing efficiencies are needed to be further explored and more novel high-performance machine learning algorithms can be applied to siRNA efficiency prediction.This paper detected the potential features associated with silencing efficiencies from siRNA sequence and developed siRNA silencing efficiencies based on Random Forest algorithm.And then,to detect the effect of different motifs of siRNA on silencing efficiencies,a convolutional neural network model to predict siRNA silencing efficiencies was proposed.The main contents of this paper are as followed:1.This paper extracted the new features from 2-mer and 3-mer motif based on position encoding and developed Random Forest prediction model for silencing efficiency prediction.Since siRNA sequence is an important factor in the RNAi process,dipping more potential features from siRNA sequence is always the research focus.Studies have shown that when the 2-3bp RNA at every position of a siRNA sequence was substituted by DNA,the RNAi activity changed.Thus,not only the position and composition of the single nucleotide on the siRNA sequence are related to the efficiency of RNAi,the 2-mer and 3-mer motif at specific positions of the siRNA sequence are also associated with RNAi efficiency.In this paper,we first demonstrated that the 2-mer and 3-mer motif at different positions of the siRNA sequence were significantly different between avtive siRNA and inactive siRNA.Then,the 2-mer and 3-mer motifs based on position encoding are extracted as new features.And the feature selection algorithm based on RF-Variable importance was used to select the feature subset which was most relevant to the silencing efficiency of siRNA,and the siRNA silencing efficiency prediction model based on random forest was constructed.The results of the validation experiments on the Huesken dataset showed that the predicted PCC value of the siRNApred prediction is 0.722,which is 9.39%,10.39%,9.56% and 7.76% higher than Biopredsi,i-score,Thermo Composition-21 and DSIR respectively.In addition,predictive experiments were performed on multiple independent data sets to examine the generalization of siRNApred.Our model showed more stable performance than other methods.The online address of the siRNApred tool is http://www.jlucomputer.com:8080/RNA/.2.The prediction method of siRNA silencing efficiency based on convolution neural network is proposed.The effect of siRNA sequence on RNAi efficiency is not only related to 2-mer and 3-mer motif,but the multimode motif may also be closely related to siRNA silencing efficiency.However,the existing siRNA feature extraction method does not reflect the contribution of multimode motif to siRNA silencing efficiency.In order to explore the effect of multimode motif,this paper proposed a siRNA efficiency prediction model based on convolution neural network.In the convolution layer,we designed a reasonable size of the convolution kernel as a motif detector to automatically learn the potential feature pattern of multimode motif and combined multiple motifs to develop siRNA silencing efficacy prediction model.This model is developed by the model superparameters experimentally calibrated and consists of a convolution layer,a pool layer and an output layer.The convolution layer used the 14 convolution kernel from 5 × 4 to 18 × 4 to detect the potential motif feature pattern.The maximum pooling and mean pooling were used in the pooling layer to select the most representative neurons to form the feature expression.The output layer utilized logical regression to compute the prediction result.The results showed that the PCC and AUC values of the method were 0.717 and 0.894,which were higher than those of Biopredsi,DSIR and siRNApred.This method can deeply extract the contribution of multimode motif to siRNA silencing efficiency in siRNA sequence,and more fully contains the valuable traits such as the local characteristics,base and motif composition and position arrangement of siRNA sequence in the feature pattern.This data-driven feature learning model is superior to the feature extraction pattern that relies on expert knowledge presets.In this paper,the main innovations include:(1)extracted the new features from 2-mer and 3-mer motifs based on position encoding;proposed feature selection algorithm based on z-score,and proposed a siRNA silencing efficacy prediction model combining single nucleotide representation,nucleotide composition,the features from 2-mer and 3-mer motifs based on position encoding and thermodynamic features,then developed an on-line platform for siRNA silencing efficiency prediction.(2)designed a suitable convolution kernel to detect the motif feature pattern;developed and validated the siRNA efficacy prediction model based on convolution neural network.In summary,this paper is designed to explore more features associated with siRNA silencing efficacy and bring together various siRNA features and feature selection method to build an optimal feature set according to biological property and increase the prediction efficacy using Random Forest predictor.At the same time,a reasonable convolution neural network structure is designed to learn the potential feature pattern of multimode motif to design more active siRNA for the targeted m RNA.In this paper,two siRNA efficiency prediction models were developed.We explained the two models in detail and verified the prediction accuracies of the two models by comparative experiments.The results showed that the proposed methods have better performance than the existing siRNA silencing efficiency prediction methods.
Keywords/Search Tags:siRNA design, RNA interference, Random Forest, Feature Selection, Convolutional Neural Networks
PDF Full Text Request
Related items