Font Size: a A A

Research On Long Non-coding RNA Identificationition Based On Machine Learning

Posted on:2021-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiFull Text:PDF
GTID:2370330614965836Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput sequencing technology,massive gene sequencing data is included in related gene databases.How to mine and analyze these massive gene sequence data is an important issue in the field of bioinformatics.As a kind of important RNA molecules,long non-coding RNA(Lnc RNA)plays an important role in human life activities.How to accurately identify Lnc RNA is a hot issue in current bioinformatics research.Cancer is a type of disease that seriously threatens the health of modern humans.Lnc RNA is widely involved in the occurrence and development of cancer.Identifying cancer-related Lnc RNAs is of great significance for studying the relationship between genes and cancers,as well as effective biomarkers and targets for the diagnosis and treatment of cancer.Based on advanced machine learning and deep learning algorithms in the field of artificial intelligence,this article conducts related research on general Lnc RNA identification and cancer-related Lnc RNA identification.The research work and innovation of this article include the following three parts:(1)Based on the AdaBoost ensemble learning method and decision tree algorithm,this paper proposes an AdaBoost-DT integrated classification model for identifying Lnc RNA.This paper integrates the three types of features of transcript sequence length,GC content,and k-mer subsequence frequency,and uses the integrated learning method of AdaBoost method and decision tree algorithm to build an AdaBoost-DT integrated classification model to identify Lnc RNA and identify it on the test set.The accuracy rate is 87.28%,and the identification accuracy of this model is higher than other identification methods using traditional machine learning algorithms.The model has a good accuracy of Lnc RNA identification.(2)Based on the convolutional neural network classification model,this paper proposes an effective Lnc RNA identification method,Lnc-CNN.The Lnc-CNN method performs one-hot coding on the RNA sequence data set and inputs it to the convolutional neural network to train a CNN classification model to identify the RNA sequence.The Lnc-CNN method does not require manual feature extraction,reduces the complexity of the experimental process,and improves the accuracy of the identification results.It achieves a identification accuracy of 92.27% on the test set derived from the GENCODE gene database.The identification accuracy is compared with traditional methods.The AdaBoost integrated learning identification method of machine learning algorithms improved by 7.87%.This method has a very good identification effect for RNA sequences that are not too short.(3)Based on the AdaBoost-DT integrated classification model,a cancer-related Lnc RNA identification method named "Can Lnc-ADT" is proposed in this paper.The Can Lnc-ADT method integrates four types of transcript sequence features: expression characteristics,epigenetic characteristics,genomic characteristics,and network characteristics.The AdaBoost-DT integrated classification model using the AdaBoost method and decision tree algorithm is used to identify cancer-related Lnc RNAs on the test data set.The identification accuracy rate reached 94.32%.Compared with the latest cancer-related Lnc RNA identification methods such as CRlnc RC and CRlnc RC2,the identification accuracy rate increased by 3.75% and 2.71%,respectively.The AdaBoost-DT integrated classification model and the Lnc-CNN method proposed in this paper can effectively identify Lnc RNA and lay the foundation for further research on the function of Lnc RNA.The Can Lnc-ADT method proposed in this paper can accurately identify cancer-related Lnc RNAs and is of great significance for further studying the role of Lnc RNAs in the occurrence and development of cancer.
Keywords/Search Tags:LncRNA Identification, Ensemble Learning, CNN, Deep Learning, Cancer
PDF Full Text Request
Related items