| Long noncoding RNA is an RNA molecule involved in the various regulatory processes in cells,and its length is more than 200 nucleotides and its abnormal expression is associated with human diseases.Promoter is a DNA sequence that RNA polymerase recognizes,binds and begins transcription.The mutation of gene promoter will lead to the disorder of gene expression regulation,which will lead to human diseases,such as malignant tumor,etc.Enhancers are DNA fragments that can bind to proteins and enhance the transcriptional function of genes.The genetic variation of enhancer is also related to human diseases.Consequently,it is of great significance to correctly identify the subcellular localization of long noncoding RNA,promoter and enhancer.With the explosive growth of biological data,the traditional biological experiments are not only time-consuming and labor-consuming,but also expensive.Therefore,this paper applies machine learning technology to feature design and classification algorithms of long noncoding RNA,promoter and enhancer.The main results are summarized as follows:(1)For the localization of long noncoding RNA,KD-KLNMF model is established in this paper.The model uses k-mer and Geary-based spatial autocorrelation to extract the local and global sequence order information from the original sequence respectively,and then SMOTE is used to process the unbalanced dataset.Then the optimal features are selected by Kullback-Leibler divergence-based nonnegative matrix factorization.And then SVM combined with 10-fold cross validation is used as classifier.Finally,the model is tested and verified by jackknife.In particular,this paper constructs a new and independent data set to test the model.Finally,on the training set and test set,the overall prediction accuracy is97.24% and 92.86% respectively,surpassing the existing models.(2)For the identification of promoter and their strength,i Pro-GAN model is built.The model uses Moran-based spatial autocorrelation to extract features,designs a generative adversarial network based on depth convolution to classify,and uses 10-fold cross validation to evaluate model.On the benchmark data set,the accuracy of the two layers of the model is 93.15% and 92.30% respectively,and on the independent data set,the accuracy of the two layers of the model is 86.77% and 91.66% respectively,which is much higher than the existing models,especially the second layer.(3)For the identification of enhancer and their strength,i Enhancer-WDEST model is proposed.The model uses four feature extraction methods: mismatch,dinucleotide based autocorrelation,dinucleotide based cross covariance and Geary-based spatial autocorrelation.SVM is used as the base classifier,and then the weighted DS evidence theory is used to fuse the outputs of the four base classifiers to obtain the final classification accuracy.Finally,10-fold cross validation is used to evaluate the model.On the benchmark dataset,the accuracy of the two layers of the model is 79.62% and 69.61%respectively,and on the independent dataset,the accuracy of the two layers of the model is77.50% and 68.00% respectively,which exceeds most of the existing models,indicating that the model has reference value and competitiveness. |