Font Size: a A A

Subcellular Localization Prediction Of Long Non-coding RNA

Posted on:2020-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z D SuFull Text:PDF
GTID:2370330596475257Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Long non-coding RNA(LncRNA)are a class of ncRNA molecule with more than200 nucleotides.They have important functions in cell development and metabolism and can regulate different biological processes such as cell cycle,transcription and translation.Their function is strongly related to their locations in the cell.Many studies have shown that the abnormal expression of LncRNA is associated with several types of cancer,such as Alzheimer's disease.Therefore,studying the function of LncRNA is of great significance for the treatment of diseases and the development of life sciences.Knowledge about their subcellular locations can provide very useful clues or preliminary insight into their biological functions.Although the current biochemical experimental equipment is advanced,the location of LncRNA in cells can be measured,but they are both time-consuming and expensive.Therefore,it is important and necessary to develop a bioinformatics tool for rapidly and efficiently identifying LncRNA molecules at subcellular locations.Based on the LncRNA sequence information,a bioinformatics tool called "iLoc-lncRNA" was developed.We extracted all the sequences containing the LncRNA subcellular location information of animals from the RNALocate database.Total of1360 LncRNA sequences were obtained.After processing by CD-HIT clustering algorithm,we obtained 655 non-redundant LncRNA sequences.Then,nucleotide component information,long-range correlation sequence information,and 8-mer nucleotide frequency information of the LncRNA sequence were extracted.In order to eliminate the redundant or noise information in the feature,we used the minimum redundancy maximum correlation,variance analysis and binomial distribution strategy to sort the extracted features to determine the optimal feature subset.Then,the support vector machine was used to construct the prediction model.We investigated the performances of different feature extraction methods and optimization strategies.As a result,the strategy of incorporating the optimal 8-mer nucleotide fragment to the general PseKNC(pseudo k-tuple nucleotide composition)using the binomial distribution was chosen to predict the subcellular location of LncRNA.By using the jackknife test method for accuracy examination,the prediction tool has an accuracy of86.72% on the reliable benchmark dataset,which is higher than the existed predictor.Finally,for the convenience of scholars,we have built an online service predictor(http://www.lin-group.cn/server/iLoc-LncRNA/)for users.A stand-alone package was also constructed.
Keywords/Search Tags:LncRNA subcellular localization, pseudo k-tuple nucleotide composition(PseKNC), the binomial distribution, support vector machine
PDF Full Text Request
Related items