Font Size: a A A

Studies Of The Protease Specificity Based On The Substrate Sequences

Posted on:2019-02-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:E F QiFull Text:PDF
GTID:1310330542497006Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the application of high throughput technology in proteomics,more and more proteins have been identified by mass spectrometry after being digested by proteases.In the proteomics experiments,it is essential to digest all proteins in silico in advance,and the choice of appropriate proteases in the prior digestion is important for the hydrolysis of proteases.Therefore,selected protease must have a very high specificity.The specificity of protease is fundamentally guided by the interactions between protease and substrate in active sites.Profiling the specificities of cleavage sites in substrates is an important step in characterizing the biochemical properties of proteases.Uncovering the substrate specificities of these proteases is central to understanding the important role of proteases in the process of metabolism and disease.However,with the increase of the large amount of the substrate sequences,it is a great challenge for bioinformatics to analyze and obtain useful information through enormous data for protease specificity.In the study of characterizing the specificities of proteases,whether by the experimental method or the quantitative analysis,it is usually assumed that the binding of each amino acid residue in the substrate to its corresponding subsite is completely independent.However,the binding of a special amino acid residue at a subsite may have a positive or negative effect on the binding of amino acid residues at other subsites.The usual quantitative method at present is to calculate the frequencies of all the amino acid residues at a single site,which can't reflect the interactions between residues at different subsites.Moreover,it is merely studied experimentally on individual proteases for the subsite cooperation without any quantitative method to measure the partnership.Understanding the cooperation between subsites is fundamental for comprehensively understanding the mechanism of how the active sites determine protease activity,and for the further research that relies on this knowledge,such as the identification of new substrates,and designing peptides for inhibiting or tracking the activity of protease.For this purpose,we primarily download the data on known substrates from the MEROPS database.To avoid the false positive,we remove the redundant substrate sequences by greedy algorithm.The remained substrates are prepared for further analysis.We establish some cooperation models between subsites and propose some algorithms for the protease specificity.Meanwhile,based on the analysis of substrate sequences,a quantitative method is presented for the protease similarities.In this thesis,we firstly propose a quantitative method based on block aiming at the residue combinations at successive sites flanking the cleavage bond in the substrate sequence.The result indicates that most of the proteases have significant amino acid combinations in the blocks near the cleavage bond of the substrates.These combinations imply that the amino acids in the substrates close to the cleavage bond may cooperate more preferably than those far away.Among the multiple site combinations,we propose a new quantitative method centered at the cleavage bond in the substrates.Three types of site combination models are put forward,which are binary,ternary and quaternary.The result shows that the favorable amino acid combinations can be detected in each type of site combination.Meanwhile,we integrate the specificity at single site into three types of amino acid combination models so as to reflect the site cooperation better.In addition,our method can be generalized to other site combinations according to actual situations.Both the model based on block and multiple site combinations focus on the favorable amino acid combinations,which play some positive effect in the process of hydrolyzation.Basing on the analysis of substrate sequences,we first propose a quantitative method for the amino acid combinations that may not be cooperative at given two sites.Compared with other quantitative methods,our approach can clearly find the preferences of each amino acid residue at one site to the residues at other site,which provides a new way of studying protease specificity.Basing on the substrate sequences,we also propose a novel method for measuring the similarities of proteases.For each protease in the dataset,we construct a vector on the base of amino acid residues bound at each subsite of protease with dimension L x 20,in which L represents the length of sites.Then we sort the vector and get a rank vector in order to demonstrate the genetic relationship between proteases as much as possible.Meanwhile,the similarities between proteases are calculated according to the rank vector,and the differences are visualized through the phylogenetic tree.Compared with other methods,almost all the homologous proteases are clustered in the small branches in our phylogenetic tree,and the proteases belonging to the same catalytic type are also clustered together,which will reflect the genetic relationship among the proteases.In conclusion,several excellent results have been produced in both the site cooperation models and the similarity analysis among proteases basing on analysis of the substrate sequences.Meanwhile,these methods can also be generalized widely and contribute to the study of protease specificity.What's more,these quantitative methods will provide a theoretical basis for prediction of cleavages in substrates and inspire the design and development of targeted drugs for some proteases.All the algorithms are implemented in C++,and the source codes?testing dataset and the introduction are freely available at:(1)PBlock:https://sourceforge.net/projects/PBlock/files/?source=navbar(2)Combination:https://sourceforge.net/proj ects/combinations/files/?source=navbar(3)Uncooperative:https://sourceforge.net/projects/uncooperative/files/?source=navbar...
Keywords/Search Tags:protease, substrate sequence, block, site combinations, phylogenetic tree
PDF Full Text Request
Related items