Font Size: a A A

Research On Prediction Algorithm Of Protein Submitochondrial Localization Based On Machine Learning And Sequence Information

Posted on:2022-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y P JinFull Text:PDF
GTID:2518306476990009Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Predicting protein subcellular location is an important research area in proteomics.With the deepening of research,more and more subcellular organelles have been proved to have substructures,and the function of proteins may be related to their location at the sub-subcellular structure level.Mitochondria have always been the focus of researchers in sub-subcellular localization.Determining the specific location of the submitochondrial protein not only helps to understand the function of the protein more deeply but also provides information for disease pathogenesis and drug design.In the era of big data,it is of long-term significance to use computational methods to accurately predict the position of protein submitochondria to assist traditional experimental methods.In this paper,the application of computational methods in the submitochondrial localization.The main research contents are as follows:(1)A predictor RLS-Sub Mito is proposed based on ensemble learning,for predicting proteins with three submitochondrial localizations.First,the dipeptide composition(DC),pseudo-amino acid composition(Pse ACC),and Position-specific score matrices with auto-cross covariance(ACC-PSSM)were fused to extract the protein sequence features.Secondly,the unbalanced problem in the data is dealt with by the over-sampling method,and then the Xtreme Gradient Boosting(XGBoost)is used to select the optimal sub-features.Finally,an ensemble classifier consisting of random forest(RF),support vector machine(SVM),and light gradient supercharger(Light TGBM)are used to predict protein submitochondrial locations.The experimental results reveal that the predictor RLS-Sub Mito obtained good results and outperformed the existing predictors.(2)A mitochondrion has four submitochondrial compartments,but various existing studies ignored the intermembrane space.An end-to-end predictor Deep Pred-Sub Mito is proposed employing deep neural networks,for protein submitochondrial location prediction incorporating intermembrane space proteins.First,the over-sampling method is used to decrease the influence caused by unbalanced datasets.Second,the protein sequence is cut into multiple overlapping fixed-length subsequences,each of which is a signal channel for the entire sequence.Next,a multi-channel bilayer convolutional neural network is trained for multiple subsequences to learn high-level features.Finally,the cross-validation method is used to measure the performance of the model on the SM424-18 dataset and the Sub Mito Pred dataset.Experimental results show that the predictor outperforms state-of-the-art predictors.
Keywords/Search Tags:Submitochondrial location, Feature fusion, Classifier ensemble, Imbalance data, Deep learning
PDF Full Text Request
Related items