Identification Research And Application For Protein Post-translational Modification Sites

Posted on:2015-08-02

Degree:Master

Type:Thesis

Country:China

Candidate:X Chen

Full Text:PDF

GTID:2181330422477413

Subject:Analytical Chemistry

Abstract/Summary:

PDF Full Text Request

Protein post-translational modification (PTM) increases the functional diversity of theproteome by the covalent addition of functional groups or proteins, proteolytic cleavage ofregulatory subunits or degradation of entire proteins. These modifications influence almost allaspects of normal cell biology and pathogenesis. Therefore, identifying and understandingPTMs is critical in the study of cell biology and disease treatment and prevention. Proteomicsas a rapidly growing field has witnessed tremendous advancement during the past decades,which has led to the generation of prodigious quantity of data. These proteome data aregreatly expediting the further development of PTMs. Although the high-throughputexperimental technologies have gained much great achievement in PTMs researches, thisapproach requires intensive and extensive labor, but does yield insufficient learning to theunderstanding of structure. Consequently, the prediction and analysis of PTMs with reliableand high-efficiency computational approaches is very important significance. In our work, wenot only profile biochemical environment, structure and conservative for the sequence aroundPTM sites, but also analysis the functions and international networks of the substrates onPTMs. The models were constructed to predict PTM sites based on the statistical profilingintegrated the strategy of machine learning. The main contents are summarized as follows:1. To address the limitations of the existing methods, we developed a new tool known asUbiProber, which is specifically designed to predict both general and species-specificubiquitylation sites. Reliable and large-scale experimental ubiquitin proteomics data frommultiple species were collected from several sources and used to train the ubiquitylation siteprediction models. Three sets of features including k nearest neighbor (KNN) feature,physicochemical property (PCP) and amino acid composition (AAC) were extracted from thetraining data and combined using a SVM to make predictions. The KNN features capture thelocal sequence similarity around sites that are ubiquitylated by the same enzyme or enzymefamily regardless of whether the enzymeâ€“substrate interactions are known. PCPs and AACsreflect the biochemical environment of the regions surrounding ubiquitylation sites, and theseregions play various roles in the structure and function of a protein. Additionally, to extractthe meaningful information and enhance the overall accuracy of the predictor, the informationgain (IG) method was first used to choose some key positions and key amino acid residues tooptimize each feature set. Furthermore, we discussed the relationship of ubiquitylation indifferent species. Our analysis shows the following:(i) the ubiquitylation patterns areconserved across different species;(ii) some key positions and key amino acid residues areessential for improving the prediction performance of a ubiquitylation model; and (iii) the physicochemical properties of residues in the flanking sequences are important for theubiquitylation process. Finally, the software system and web service of UbiProber wereimplemented in.Net4.0framework and are freely available at:http://bioinfo.ncu.edu.cn/UbiProber.aspx.2. We present a new computational tool known as PupPred, which was constructed topredict the pupylation of prokaryotic proteins by using the latest data of PupDB database. Thedatabase PupDB contains182pupylated proteins with215known pupylation sites. In thiswork, we found the composition of amino acid pairs is suitable for representing the sequencecontext surrounding the pupylation sites after our preliminary assessment by seeking a variousencoding. The PupPred achieved a balanced performance of both high sensitivity andspecificity by using the encoding scheme of amino acid pairs based on a more training data,and that outperformed the GPS-based predictor when evaluated on the same dataset. Moreimportantly, the sequential, structural and evolutionary hallmarks around pupylation siteswere exhibited. Previously, we had developed a tool known as UbiProber for the prediction ofeukaryotic ubiquitylation sites. Since ubiquitylation and pupylation are functional analoguesin cell, we tried to use the method of UbiProber to predict prokaryotic pupylation sites.Unfortunately, a poor prediction results were obtained, where may be partially due to differentsequence functionality and constraints between ubiquitylation and pupylation. So, wesystematically compared pupylation with ubiquitylation through comparing the differences ofthe environmental, conservative hallmarks and statistical analyzing of their respective geneontology (GO) terms. Taken together, these systematic analyses and predictions can allow usto gain better insights into processes and functions of pupylation. PupPred predictor is freelyavailable at: http://bioinfo.ncu.edu.cn/PupPred.aspx.3. To efficiently accelerate development of the highly complex subcellularphosphoproteomic, an integrated platform combining experimentally data querying andunknown data annotation is highly demanded. Here, we developed a platform which providesboth a searchable online database and a computational tool to efficiently and reliablyaccumulate the subcellular phosphoproteome for further experimental investigation. In thiswork, we report the most thorough characterization of subcellular phosphoproteome in humanto date. Originally, reliable experimental phosphoproteomics data with verified information ofsubcellular localization in human were collected from several sources and utilized to profilesubcellular phosphoproteome. Not only do we find that most phosphorylation proteins areuniquely resided in specific subcellular compartment, we also show that the distribution ofphosphorylated proteins in subcellular compartments is compartment-specific. Functionalenrichment analysis and protein-protein network analysis reveal that the phosphorylation signaling pathways of subcellular compartment have higher specialization. Moreover, ourlarge data set allows us to delineate type-specific phosphorylation sequence motifs contrary togeneral phosphoproteome, and we show that there are sequence motifs of specific subcellularcompartment. Overall our observations highlight compartment-specific phosphorylationsignaling pathways, which stress the importance of mapping protein phosphorylation in thephysiologically relevant subcellular compartment. Later, we developed a bioinformatics tooltermed SubPhosPred, which combines a novel Discrete Wavelet Transform (DWT) algorithmwith Support Vector Machine (SVM) approach to identify phosphorylation sites for differentsubcellular compartments in human. One innovative character of Wavelet Transform (WT)was firstly used as features encoding for PTM prediction. Cross-validation tests show thatDWT algorithm can boost predictive performance and obtain encouraging prediction resultsfor each compartment. For SubPhosPred we have trained eight compartment-specificphosphorylation prediction models (cell membrane, nucleus, cytoplasm, mitochondrion, golgiapparatus, endoplasmic reticulum, secreted, lysosome). Finally, the platform integratedSubPhosDB database and SubPhosPred predictor is freely available for academic research at:http://bioinfo.ncu.edu.cn/SubPhos.aspx.

Keywords/Search Tags:

PTM, Ubiquitylation, Pupylation, Phosphorylation, SubcellularPhosphoproteomic, Analysis and Prediction, Discrete Wavelet Transform, Support VectorMachine, Database, Web services, Information gain

PDF Full Text Request

Related items

1	Adaptive Wavelet Packet Feature Extraction Support Vector Machine Model And Spectral Analysis Applications
2	Prediction Of Protein-Ligand Binding Residues Using Sequence Information And Extreme Gradient Boosting
3	Analyzing Similarity Of Protein Sequences With Discrete Wavelet Transform
4	Studies On Fundmental Issues Of Near-Infrared Spectroscopy: In-Line Analysis, Multi-Component Anlysis And Spatial Effect
5	Research On Multi-scale Analysis And Deformation Law Of Mining Area Surface Subsidence Based On Wavelet Transform
6	Research On Ozone Concentration Prediction And Method Comparison Based On Kernel Extreme Learning Machine And Wavelet Transform
7	Residual Life Prediction Of Key Parts Of Shearer Rocker Arm Based On LSTM Network
8	The Evaluation For Coastal Wetland Ecosystem Services In Liaoning Province-based On Scale Transform Methods
9	Research On Enhancement And Recognize Of Mongolian Furniture Patterns Based On Singular Values And Gamma Functions In Frequency Domain
10	Application Of Chemometric Methods In Chemical Information Processing