Study On Feature Design And Classification Algorithms Of Long Noncoding RNA And Promoter-Enhancer

Posted on:2023-07-13

Degree:Master

Type:Thesis

Country:China

Candidate:H J Qiao

Full Text:PDF

GTID:2530306905997489

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

Long noncoding RNA is an RNA molecule involved in the various regulatory processes in cells,and its length is more than 200 nucleotides and its abnormal expression is associated with human diseases.Promoter is a DNA sequence that RNA polymerase recognizes,binds and begins transcription.The mutation of gene promoter will lead to the disorder of gene expression regulation,which will lead to human diseases,such as malignant tumor,etc.Enhancers are DNA fragments that can bind to proteins and enhance the transcriptional function of genes.The genetic variation of enhancer is also related to human diseases.Consequently,it is of great significance to correctly identify the subcellular localization of long noncoding RNA,promoter and enhancer.With the explosive growth of biological data,the traditional biological experiments are not only time-consuming and labor-consuming,but also expensive.Therefore,this paper applies machine learning technology to feature design and classification algorithms of long noncoding RNA,promoter and enhancer.The main results are summarized as follows:(1)For the localization of long noncoding RNA,KD-KLNMF model is established in this paper.The model uses k-mer and Geary-based spatial autocorrelation to extract the local and global sequence order information from the original sequence respectively,and then SMOTE is used to process the unbalanced dataset.Then the optimal features are selected by Kullback-Leibler divergence-based nonnegative matrix factorization.And then SVM combined with 10-fold cross validation is used as classifier.Finally,the model is tested and verified by jackknife.In particular,this paper constructs a new and independent data set to test the model.Finally,on the training set and test set,the overall prediction accuracy is97.24% and 92.86% respectively,surpassing the existing models.(2)For the identification of promoter and their strength,i Pro-GAN model is built.The model uses Moran-based spatial autocorrelation to extract features,designs a generative adversarial network based on depth convolution to classify,and uses 10-fold cross validation to evaluate model.On the benchmark data set,the accuracy of the two layers of the model is 93.15% and 92.30% respectively,and on the independent data set,the accuracy of the two layers of the model is 86.77% and 91.66% respectively,which is much higher than the existing models,especially the second layer.(3)For the identification of enhancer and their strength,i Enhancer-WDEST model is proposed.The model uses four feature extraction methods: mismatch,dinucleotide based autocorrelation,dinucleotide based cross covariance and Geary-based spatial autocorrelation.SVM is used as the base classifier,and then the weighted DS evidence theory is used to fuse the outputs of the four base classifiers to obtain the final classification accuracy.Finally,10-fold cross validation is used to evaluate the model.On the benchmark dataset,the accuracy of the two layers of the model is 79.62% and 69.61%respectively,and on the independent dataset,the accuracy of the two layers of the model is77.50% and 68.00% respectively,which exceeds most of the existing models,indicating that the model has reference value and competitiveness.

Keywords/Search Tags:

feature extraction, support vector machine, generative adversarial learning, DS evidence theory

PDF Full Text Request

Related items

1	Support Vector Machines Classifier Based On Margin Vectors
2	Support Vector Machine Data Classification
3	Study Of Algorithms For Support Vector Machine
4	Possible Unconventional Superconductivity In Hexagonal CaFe₂As₂ And Generative Adversarial Quantum Circuits
5	Method Development For Predicting Protein Subcellular Localization Based On Deep Learning
6	The Study And Application Of Support Vector Machines
7	Support Vector Machine Theory, Algorithm And Implementation
8	Landslide Susceptibility Evaluation Based On Support Vector Machine Model And Evidence Theory
9	Research On Road Extraction Method Of Trajectory Data Based On Machine Learning
10	Support Vector Machine Based On Artificial Error