Font Size: a A A

Research On Prediction Of Phosphorylation Modification Sites Based On Machine Learning

Posted on:2021-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:X Z DengFull Text:PDF
GTID:2430330611992470Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Post-translational modifications of protein,occurred during a covalent processing after the RNA translation,is involved in almost all normal life activities of cells.It is an important mechanism for regulating protein functions.The in-depth study of posttranslational modification of proteins plays an important role in understanding the pathogenesis of human diseases and the research and development of human proteomics.The traditional experimental identification method is time-consuming and labor-intensive,and difficult to operate against the massive data.Recently,with the emergence of the interdisciplinary subject of bioinformatics,the prediction of protein post-translational modification using machine learning computational prediction algorithms has become one of the important research contents at present.Therefore,this paper has studied and analyzed the phosphorylation modification based on machine learning algorithm.The details are as follows:(1)Prediction of Human Protein Phosphorylation Sites based on Support Vector Machines.A feature extracting method for phosphorylated sequence fragments is proposed.Information entropy and density entropy are used to extract the conservative information of protein sequence fragments on both sides of the sites,based on the integration of specificity between different species,human protein phosphorylation sequences selected from the dataset.Combining the four types of information entropy and density entropy,amino acid content,amino acid physicochemical properties and KNN distance to encode sequence fragments,F-score test method was introduced to screen physicochemical properties.Fusion extracted and filtered features,and a support vector machine algorithm was used to build a prediction model.Ten-fold crossvalidation result shows that compared with other features the information entropy and density entropy features can effectively improve the prediction performance of phosphorylation modification.Compared with existing models using independent test sets,the proposed tool has good prediction performance.(2)Prediction of Yeast Phosphorylation Modification Sites based on Ensemble Learning.Based on random forests,this paper proposed an ensemble learning strategy for yeast phosphorylation modification,combined with location information proposed pos-K-spaced features.Five features were extracted from the dataset,a random forest model was trained with each single feature,the prediction results of the random forest model show that the pos-K-spaced feature can effectively distinguish between phosphorylated modification sites and non-phosphorylated modification sites.And then the output results of the 5 random forests were combined using a logistic regression algorithm to obtain the final prediction model.The experimental results show that the integrated model proposed is more accurate in predicting phosphorylation modification sites than the model built with a single machine learning algorithm.
Keywords/Search Tags:Support vector machine, Information entropy, F-value, Random forest, Ensemble learning
PDF Full Text Request
Related items