Non-coding RNAs are RNA molecules that are not translated into proteins.They are involved in a variety of physiological processes and play important regulatory roles in organism.The prediction of non-coding RNAs is an important task.In this paper,we develop a computational method for various non-coding RNAs prediction by using noncoding RNA sequence-derived features.In the construction of the non-coding RNApredicti on model,firstly,we investigate a variety of non-coding RNA sequence-derived features,and evaluate the usefulness of features for the non-coding RNA prediction.Then we use the random forest,support vector machine and logistic regression to learn these sequence-derived features and obtain the individual feature-based predictors,respectively.Further,we develop the sequence learning ensemble method,which uses the linear weighted sum of outputs from the individual feature-based predictors to predict non-coding RNAs,and the genetic algorithm is adopted to optimize the weight parameters in the ensemble system.In the computational experiments,we implement comprehensive prediction experiments for bacterial small non-coding RNAs and human microRNA precursors respectively,and evaluate our method by using 5-fold cross validation.The sequence learning ensemble method can achieve AUC scores greater than 0.94 and 0.97 for the two experiments,and outperforms existing state-of-the-art non-coding RNA prediction methods.In conclusion,the proposed method can effectively combine multiple sequencederived features and produce high-accuracy performances.Also,it is robust to non-coding RNA types.Therefore,the proposed method has a great potential for non-coding RNA prediction. |