Font Size: a A A

Protein And RNA Modification Sites Prediction By Using Machine Learning Method And Its Application

Posted on:2021-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z M YuFull Text:PDF
GTID:2370330611988199Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the arrival of big data era,the number of sequences in the biological database has increased exponentially.Researchers analyzed the discipline contained in the database from sequences,which has became the research focus in bioinformatics.Protein and RNA modification are related to many life processes,and play the important role in pathology.Traditional experimental methods have the disadvantages of time-consuming and cost-prohibitive.However,the machine learning methods can accurate and efficient predict protein and RNA modification sites,which can improve the development of proteomics and genomics,and promote the understanding of the pathogenesis of diseases.This paper uses machine learning methods to predict protein and RNA modification sites,and main contents are as follows:1.A novel method called DNNAce for protein acetylation sites prediction is proposed.First,the feature vectors corresponding to the BE,PseAAC,AAindex,NMBroto,EBGW,MMI,BLOSUM62 and KNN are fused to obtain the initial feature space.Secondly,Group Lasso is used to remove the features that are not related to the acetylation sites classification for the first time,and the effective features are selected to obtain the optimal subset,which reduces the feature space dimensions.Finally,deep neural network is used to predict the 9 datasets acetylation sites.10-fold cross-validation to obtain the evaluation indexs and other methods are compared with DNNAce.The results show that the DNNAce can improve the prediction accuracy to some extend and can provide a new method for the other protein post-translational modification sites prediction.2.A new method named StackRAM for RNA N~6-methyladenosine sites prediction is proposed.First,the RNA sequences are encoded by Binary,NCP,ANF,K-mer,PseDNC and PSTNP,and the original feature dataset is obtained through multi-information fusion.Secondly,the Elastic net is used to remove redundant and noise information,retain important features for m~6A sites classification,and obtain the optimal feature subset for the first time.Finally,the probability values of base-classifiers LightGBM and SVM are combined with the optimal feature subset,and the combined feature as the input to the second-stage meta-classifier SVM.Prediction accuracy of independent test datasets H.sapiens and A.thaliana reached 92.30%and 87.06%,respectively.The StackRAM has stronger competitiveness in m~6A sites identification,and has good development potential in cross-species prediction,which can be a useful tool for identifying m~6A sites.
Keywords/Search Tags:machine learning, protein acetylation sites, RNA N~6-methyladenosine sites, multi-information fusion, deep neural network, Stacking integration
PDF Full Text Request
Related items