Font Size: a A A

Research On Prediction Of LncRNA Subcellular Localization

Posted on:2020-06-11Degree:MasterType:Thesis
Country:ChinaCandidate:X F YangFull Text:PDF
GTID:2480306131961939Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Long non-coding RNAs(lnc RNAs)are a sort of RNA molecules with a length more than 200 nucleotides.They play important roles in cell metabolism and development,such as intracellular trafficking,m RNA splicing,cellula r differentiation,chromatin modification,transcriptional and post-transcriptional regulation.Increasing evidence has revealed that the biological functions of lnc RNAs are closely related to their subcellular localizations.Therefore,it is very essential to know the subcellular localization of lnc RNA.In this paper,we have proposed two novel computational methods to predict the lnc RNA subcellular localization.The first method was based on unbalanced pseudo-k nucleotide compositions.We concatenated the k-mer nucleotide composition and the sequence order correlated factors to construct the feature vector,which comprehensively utilized the sequence information of lnc RNA.Meanwhile,we applied a feature selection technique which was based on analysis of variance to obtain the optimal feature subset.Finally,we used the support vector machine method to train the model.The maximum overall accuracy of the proposed method can reach 90.37% in leave-one-out cross validation,which outperforms the existing state-of-the-art method.It is indicated the proposed predictor is very efficient.For the convenience of subsequent genetic sequence studies,the source code was given at https://github.com/Nicole YXF/lnc RNA.The second method was based on the fusion of k-mer nucleotide composition and triplet structure-sequence elements.Firstly,this method fusion the k-mer nucleotide composition and triplet structure-sequence elements to construct the feature vector,which comprehensively used the primary sequence and secondary structure information of lnc RNA.Subsequently,a feature selection technique which was based on analysis of variance was implement to filter out the noise or redundant information.Finally,the support vector machine was used to perform the prediction.The second method can achieve a maximum overall accuracy of 92.38% in leave-one-out cross validation,which is better than the first method.It is demonstrated that the proposed predictor is a powerful tool for determining lnc RNA subcellular localization.
Keywords/Search Tags:Long non-coding RNA, Subcellular localization, K-mer nucleotide composition, Sequence order correlated factors, Triplet structure-sequence elements, Feature selection, Support vector machine
PDF Full Text Request
Related items