Font Size: a A A

Sequence-based Non-coding RNA And Protein Prediction And Its Association Studies

Posted on:2020-10-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Z FuFull Text:PDF
GTID:1360330623951694Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of functional genomics and epigenetics,the important role of non-coding RNA(ncRNA)in life activities has been increasingly explored,making ncRNA quickly become the focus of scientists' research.With the rapid development of high-throughput sequencing,biomolecular sequences have accumulated rapidly,and it is becoming more and more urgent to mine important vital cell information contained in ncRNA sequences.Studies have shown that ncRNAs are usually involved in cellular activities by interacting with proteins.Therefore,sequence-based non-coding RNA and protein prediction analysis has become a focus of attention in the field of computational biology.This study considers the construction of sequence feature information extraction method and prediction model as the main lines of research.This research also studied several problems on ncRNA and protein sequence prediction,including graphic representation of RNA secondary structure sequence,DNA binding protein prediction,microRNA precursor(pre-miRNA)prediction and long ncRNA(lncRNA)–protein association prediction.This study mainly determined the following research contents:(1)Research on 3D graphical representation based on RNA secondary structure sequences.In this paper,based on the frequency of bases in RNA secondary s tructure sequences and physical and chemical properties of bases,a three-dimensional graphical representation of RNA secondary structure was proposed.Furthermore,a similarity analysis method for sliding window RNA sequence based on distance calculation was proposed.The sequence similarity analysis method was applied in the prediction of plant pre-miRNA,and three sets of benchmark datasets were constructed.In comparison with the commonly used excellent prediction algorithms,this method exhibits good prediction performance and efficiency.In addition,compared with numerous machine learning methods,this method is simpler to operate,requires no training parameters and is more intuitive.(2)DNA binding protein prediction studies based on evolutionary i nformation.The position specific matrix(PSSM)stores the evolution information of protein sequences.This paper proposes a PSSM-based feature extraction method called KPSSM-composition,which can effectively capture 20 evolutionary processes in a given sequence.Information on amino acid residues and local characteristic information of the sequence were obtained.First,the K-PSSM-composition feature information of the sequence was extracted,and the extracted feature vector was optimised using the RFE.Then,the SVM was used to train the prediction model to predict the DNA binding protein.The performances of our proposed predictive and other predictive models were evaluated using two standard benchmark dataset tests.The experimental results show that the proposed method presents better predictive performance and effectiveness in predicting DNA binding proteins.(3)Pre-miRNA prediction studies based on mutual information.In this paper,a new feature extraction algorithm based on mutual information for pre-miRNA sequences and secondary structures was proposed.This method can capture the mutual information relationship between the bases of pre-miRNA sequences and local features of secondary structures.In addition,the proposed feature vector possesses 5 5 dimensions,which is less than the feature vector dimensions of most popular methods,providing our method with more computational efficiency than the others.Finally,we extracted feature information to train the SVM model to predict pre-miRNA and compared the results with those of other excellent algorithms.Experiments were performed on balanced and unbalanced data sets and multi-species data sets.Experimental results are presented in the paper.The method exhibits good predictive performance.(4)lncRNA–protein association prediction based on multi-information fusion.In this paper,a lncRNA-protein prediction calculation model based on multi-information fusion was proposed.Firstly,a network topology property information method for expressing lncRNA and protein interaction was suggested.Then,the basic composition feature information and evolution information based on protein sequence,and the basic composition information and expression profile of lncRNA sequence were suggested.Finally,the above feature information was merged,and the optimized feature vector was fed into the SVM model by using a recursive feature elimination algorithm.The experimental results show that the proposed method features good validity and accuracy in the lncRNA–protein association prediction.
Keywords/Search Tags:Non-coding RNA, Sequence, Graphical representation, Evolutionary information, Mutual information, Feature extraction, Cross validation
PDF Full Text Request
Related items