Font Size: a A A

IDDlncLOC: Subcellular Localization Of LncRNAs Based On A Framework For Imbalanced Data Distributions

Posted on:2022-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:X P ZhuFull Text:PDF
GTID:2480306761960049Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
lncRNA is an RNA composed of more than 200 nucleotides and cannot translate into protein.Numerous evidence shows that lncRNA plays an indispensable role in the cell's whole life,such as cell cycle regulation,chromatin modification,genetic markers,transcription,splicing,genome rearrangement,mRNA decay,and translation.Nuclear lncRNAs regulate the expression of pluripotent genes via chromatin.Several examples are adopted for comprehensive reviews.For example,morbid is a nuclear-localized lncRNA.The lncRNAs,which are located in the cytosol,mostly act as posttranscriptional regulators.Likewise,lncRNA-HGBC,a cytoplasmic-localized lncRNA,considered to be a target of GBC program LncRNA RP11-732M18.3 can facilitate the treatment of glioma by interacting with the protein of 14-3-3 /.Referring to the above examples,we will be conscious of the relation between the subcellular localization of lncRNA and its function.First,there are many biological methods to identify the subcellular localization of lncRNA.For example,separating RNA from the nucleus and cytoplasm and using RTPCR to locate a certain lncRNA.Fluorescence in situ hybridization(FISH)techniques can mark lncRNAs at the site of function in the genome.Attributed to the development of computer technology,many calculation methods come into being to address the subcellular localization of lncRNA.However,the above-mentioned biological methods or calculation methods are cumbersome to operate or not practical,and cannot solve the problem of subcellular localization of LncRNA well.In this paper,we propose a new model called IDDLncLoc to predict lncRNA's subcellular localization through an ensemble model.In IDDLncLoc,a new framework fed by sequence composition features is introduced.The framework consists of five parts: oversampling in mini-batch,random sampling in maximum batch,reorganizing into new subsets,training on the new subsets,and integrating outputs.First of all,we collect lncRNA samples fromRNALocate and sort out a benchmark dataset.Afterward,we adopted three kinds of features to describe lncRNA sequences,including octamer,dinucleotide-based auto-cross covariance(DACC),Composition,transition,and distribution features(CTD)and systematically process feature selection by binomial distribution and recursive feature elimination(RFE)find out the optimal feature.We also propose a new CNN,called AFCNN,which includes a new network structure of spatial attention mechanism.Finally,based on an ensemble model which contains 21 CNN and 20 SVM,we generate 41 subsets on the benchmark dataset through random oversampling and undersampling,the results are output by a voting strategy.The final accuracy reached 94.96% on the benchmark dataset,2.59% higher than lncloc Pred.
Keywords/Search Tags:Ensemble model, imbalanced learning, subcellular localization of lncRNA, sequence feature
PDF Full Text Request
Related items