Font Size: a A A

Subcellular Localization Prediction Of LncRNAs Based On Sequence And Structural Information

Posted on:2021-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:X Q YouFull Text:PDF
GTID:2370330620476578Subject:Physics
Abstract/Summary:PDF Full Text Request
Long non-coding RNAs(long ncRNAs,lncRNA)are a type of RNA,defined as being transcripts with lengths exceeding 200 nucleotides that are not translated into protein.The protein-coding genes were widely thought to be the main players,but many resent researches suggest that about 98 percent of the non-coding genes in the human genome also play an important role,rather than “transcriptional noise”.Studies on the function of lncRNAs have shown that it plays an important role in the regulation of transcription initiation,transcription and post-transcriptional regulation,thus affecting a variety of biological processes.The sequences of many lncRNAs are known,but their functions are poorly understood.To understand the functional information of lncRNAs,it is important to obtain its subcellular localization.Although the experiment can detect the subcellular localization information of lncRNAs,it is time-consuming and expensive.In recent years,the data of lncRNAs is continuously enriched and rapidly increased.Therefore,a fast and effective sequence-based an algorithm is necessary to predict the subcellular localization of lncRNAs.The data set of lncRNAs subcellular localization was established in this paper,including five locations: nucleus,cytoplasm,cytosol,ribosome,and exosome.Several types information parameters of lncRNAs were computed,such as k-mer frequency information,sequence reduction information,three reading frame information,conservative motif information,secondary structure information,six geometrically flexible information,and so forth,and feature fusion was performed.Based on the optimal balancing of data set was done by using SMOTE(Synthetic Minority Oversampling Technique)method,support vector machine(SVM)algorithm was applied to predict lncRNAs subcellular localization.The results shown that the overall prediction success rate reached 86.86% by Jackknife test.The results indicate that the sequence reduction information,six geometrically flexible information and three reading frame information of the lncRNAs have excellent effect on the prediction of lncRNAs subcellular localization,and can help people to understand the biological function of the lncRNAs.
Keywords/Search Tags:long non-coding RNAs, sequence characteristics, subcellular localization, SMOTE, feature fusion
PDF Full Text Request
Related items