Font Size: a A A

A Multi-feature Fusion Algorithm For LncRNA Subcellular Localization Prediction Problem

Posted on:2022-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:X H SunFull Text:PDF
GTID:2480306761459934Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Of the more than three billion base pairs in the human gene sequence,twothirds can be reverse transcribed and less than two percent can be used to encode proteins.Therefore,the gene sequence contains a large number of base pairs that do not express proteins,which are called non-coding RNA(ncRNA)sequences.The proportion of these non-coding RNA sequences in the total gene sequence of an organism is closely related to the differentiation of species,and they play a regulatory role in the developmental growth of species and the expression of species-specific genes.A considerable number of studies have found that long-stranded non-coding RNAs play an increasingly important role in epigenetic regulation.For the subcellular localization of long-stranded non-coding RNAs,there are several problems:researchers can only validate them by biological experiments,which are time-consuming and expensive,and the solution to the subcellular localization problem is not very efficient.Moreover,even with machine learning solutions,the feature extraction method for long-stranded non-coding RNA sequences is relatively single;in addition,traditional studies on subcellular localization of long-stranded noncoding RNAs often focus on single-label classification problems,which is actually a multi-label classification problem from a practical point of view.Based on the above points,this paper discusses the possibility of a multi-feature fusion-based algorithm to solve the subcellular localization of long-stranded noncoding RNAs.The main work of this paper is as follows:(1)a multi-feature fusion algorithm based on LncRNA subcellular localization prediction algorithm is proposed,using nucleotide sequence features extracted from multiple angles,combined with a differential evolution-based feature fusion algorithm to fuse each feature after weight assignment,and the fused features are dimensionalized to obtain the best feature subset,and finally the best multi-label classifier is selected among the multiple The best classifier is finally selected among multiple multi-label classifiers.(2)In terms of features,four perspectives are considered in this paper:the traditional sequence assembly algorithm K-mer,the subsequence-based coding features considering contextual information,the biological structural properties of nucleotide sequences,and the pseudo-dinucleotide composition of nucleotide sequences(Pse-DNC).First,for each of the above four features extracted from nucleotide sequences,we combined them with machine learning algorithms for multi-label classification prediction.The experimental results show that all the above features can effectively express some intrinsic patterns of nucleotide sequences and subcellular localization results from different perspectives.Further,we can fuse the features extracted from the above four perspectives to obtain a fusion feature describing nucleotide sequences from multiple perspectives,and obtain the optimal feature set as the input of the machine learning classifier by dimensionality reduction.(3)Experimental results comparing the performance of each individual feature and other fusion schemes,the average results of AP and accuracy have increased,and Hamming loss,1-error rate and ranking loss have decreased.Under the condition of 10-fold cross-validation combined with 20%independent leave-out validation,the AP and other metrics are improved compared to the existing tools using the same dataset.Also,applying this paper's algorithm to an extended human LncRNA dataset,AP increased by 4.1%relative to the extant tool.An online website and an open source tool have been developed to facilitate the use of this paper by researchers.
Keywords/Search Tags:Long-stranded non-coding RNA, multi-label classification, feature fusion, differential evolution, subcellular localization
PDF Full Text Request
Related items