Font Size: a A A

Identification Of Plant Long Non-coding RNA Based On Sequence Energy Score Difference Method And SVM Algorithm

Posted on:2022-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:L J YanFull Text:PDF
GTID:2480306509461374Subject:Physics
Abstract/Summary:PDF Full Text Request
In recent years,studies have shown that long non-coding RNA(lncRNA)has rich powerful biological functions,and plays an important role in the regulation of gene expression in eukaryotes.Compared with the study of lncRNA in mammals,plants lncRNA research started relatively late,At present,how to accurately identify lncRNA from a large number of transcripts is still one of the important issues in the field of plant lncRNA research.In this paper,two new datasets are built,one is a dataset of lncRNA and mRNA in plants,and the other is a dataset of lncRNA in monocotyledons and dicotyledons,and these two datasets are used for the study of plant lncRNA.The k-mer frequency information,the open reading frame information,the secondary structure information and the geometric flexible information of RNA are extracted and all kinds of features information were fused.Based on the SVM(Support Vector Machine,SVM)algorithm and Jackknife test,the accuracy of plant lncRNA and mRNA classification prediction reached 96.14%,and the accuracy of monocotyledon and dicotyledon lncRNA classification prediction reached 82.42%.Based on an ideal of the sequence energy score difference method,the position correlation score function was constructed by using the position weight matrix and the six kinds of flexible information of RNA,and the plant lncRNA were identified by the energy score difference.The prediction results by inputing the energy score difference into the SVM algorithm as the feature vector were improved.Finally,prediction the results of plant lncRNA were analyzed,and it was found that open reading frame was an important feature of plant lncRNA,and there were some differences in various extracted features between monocotyledonous plants and dicotyledonous plants.This results will provided some help for accurate identification of plant long non-coding RNA.
Keywords/Search Tags:plant, long non-coding RNA, Support Vector Machine, position correlation scoring function, sequence energy score difference
PDF Full Text Request
Related items