Font Size: a A A

Research Of Protein Subcellular Localization Based On Machine Learning

Posted on:2022-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:X M XieFull Text:PDF
GTID:2480306542451244Subject:Master of Applied Statistics
Abstract/Summary:PDF Full Text Request
With the exponential growth of protein sequences in the database,traditional biological-related prediction algorithms are facing a series of challenges.The advent of the big data era provides opportunities for processing batch data.In addition,how to predict the subcellular location of the protein sequence is the proteome learning an important research.It can not only help us better understand protein structure and function,but also has important value for the research of the pathogenesis of some diseases and the development of new drugs.This article predicts the location of protein sub-cells based on machine learning.The research results are as follows:1.We propose an improved feature extraction algorithm based on autocorrelation function.Through the factor analysis of the physical and chemical properties of amino acids,extracting common factors can not only reduce the number of features,but also extract protein sequence information more completely,which helps improve the accuracy of protein subcellular localization prediction.2.We construct two protein subcellular localization prediction models.First,using pseudo amino acid composition,g-gap dipeptide composition,entropy density and improved autocorrelation function to extract protein sequence' features,due to the imbalance between the datasets sample categories,Synthetic Minority Oversampling Technique is used to balance them,and then fusion the feature vector corresponding to the optimal parameter,the feature selection algorithms are used to reduce the dimensionality of the high-dimensional feature vector,and finally using the classification models to predict the subcellular location of the protein sequence.These established models are tested by the 10-fold cross-validation.The results show that the established models are better than the existing research results.This provides for the prediction of protein subcellular localization new ideas and proves the feasibility and reliability of these models in protein subcellular prediction and localization.
Keywords/Search Tags:Machine Learning, Subcellular Localization, Improved Autocorrelation Function, Feature Fusion
PDF Full Text Request
Related items