Font Size: a A A

Prediction Of Bacterial Type Ⅳ Secreted Effectors And Phage Virion Proteins By Integrating Sequence And Evolutionary Information

Posted on:2023-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:H T HanFull Text:PDF
GTID:2530306818987579Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Protein is an essential component of living organisms and participates in every process of cell life activities.It has a wide variety and different functions.Accurately predicting the function of proteins is of great significance to the fields of disease prevention and drug development.With the development of sequencing technology,protein sequence data has grown exponentially.However,traditional experimental methods are time-consuming and labor-intensive,and can no longer meet the needs of today’s large amounts of protein data annotation.Therefore,there is an urgent need to use computational methods to predict protein functions rapidly and accurately.PSI-BLAST profiles have been experimentally validated to provide important and discriminatory evolutionary information for various protein classification tasks.However,evolutionary features derived from PSI-BLAST profiles have not been fully explored and used in previous work.In this thesis,we accurately identify type IV secreted effectors and phage virion proteins by using evolutionary features.The main research results are as follows:(1)Many gram-negative bacteria use type IV secretion systems to deliver effector molecules to a wide range of target cells.These substrate proteins,which are called type IV secreted effectors(T4SE),manipulate host cell processes during infection,often resulting in severe diseases or even death of the host.Therefore,identification of putative T4 SEs has become a very active research topic in bioinformatics due to its vital roles in understanding host-pathogen interactions.In this study,an accurate computational predictor termed iT4SE-EP was developed for identifying T4 SEs by extracting evolutionary features from the position-specific scoring matrix and the position-specific frequency matrix profiles.First,four types of encoding strategies were designed to transform protein sequences into fixed-length feature vectors based on the two profiles.Then,the feature selection technique based on the random forest algorithm was utilized to reduce redundant or irrelevant features without much loss of information.Finally,the optimal features were input into a support vector machine classifier to carry out the prediction of T4 SEs.Our experimental results demonstrated that iT4SE-EP outperformed most of existing methods based on the independent dataset test.(2)The phage virion protein(PVP),a type of bacteriophage structural protein,is an essential material of the infectious viral particles and is responsible for multiple biological functions.Accurate identification of PVPs is of great significance for comprehending the interaction between phages and host bacteria and developing new antimicrobial drugs or antibiotics.However,traditional experimental approaches for identifying PVPs are often time-consuming and laborious.Therefore,the development of computational methods that can efficiently and accurately identify PVPs is desired.In this study,we proposed a multi-classifier voting model called iPVP-MCV to enhance the predictive performance of PVPs based on their amino acid sequences.First,three types of evolutionary features were extracted from the position-specific scoring matrix profiles to represent PVPs and non-PVPs.Then,a set of baseline models were trained based on the support vector machine algorithm combined with each type of feature descriptors.Finally,the outputs of these baseline models were integrated to construct the proposed method iPVP-MCV by using the majority voting strategy.Our results demonstrated that the proposed iPVP-MCV model was superior to existing methods when performing the rigorous independent dataset test.After analyzing the test results of iT4SE-EP and iPVP-MCV on benchmark datasets,it can be observed that both our proposed methods perform well and outperform some existing methods in the same research field.This proves that iT4 SEEP and iPVP-MCV have great application prospects in their related fields.
Keywords/Search Tags:protein function, type Ⅳ secreted effectors, phage virion protein, support vector machine, position-specific scoring matrix, position-specific frequency matrix
PDF Full Text Request
Related items