Font Size: a A A

Chemometric Studies On Multivariate Calibration And Protein Posttranslational Modifications Site Prediction

Posted on:2017-10-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:1312330512959023Subject:Analytical Chemistry
Abstract/Summary:PDF Full Text Request
The research work in this thesis focuses on new chemometric studies for multivariate calibration and protein post-translational modifications site prediction.In the field of multivariate calibration,the author makes exploratory basic research on the algorithm and application of the quantitative analysis of complex systems,,from twodimensional,three-dimensional to four-dimensional ones.Besides,the authors have made some studies on the prediction of post-translational modification sites of proteins from the perspective of binary classification to multi-class classification.The main contents of this paper are as follows:Part ?: Multivariate Calibration?Chapter 2 to 4?Polycyclic aromatic hydrocarbons?PAHs?mainly derived from incomplete combustion and pyrolysis of organic materials are ubiquitous in such combustion products as included in cigarette smoke.Many of these species are carcinogenic and mutagenic,among them benzo[a]pyrene?Ba P?,the most thoroughly studied of all PAHs,rated as a human carcinogen by the International Agency for Research on Cancer.Therefore,the levels of Ba P in cigarette smoke should be monitored for the tobacco quality control and the assessment of the harm of Ba P to human health.At present,the generic method for detection and quantification of Ba P typically utilizes chromatographic separation followed by various detectors.However,these traditional chromatographic methods are often labor intensive,time-consuming,and some requiring sophisticated instrumentation.In chapter 2,mid-infrared spectroscopy?MIR?coupled to chemometrics is used to develop a fast,simple,non-destructive and robust method for determination of benzo[a]pyrene in the cigarette mainstream smoke.In order to enhance the predictive ability of multivariate calibration models,a novel chemometric algorithm?DPSO-WPT-PLS?based on the wavelet packet transform?WPT?,discrete particle swarm optimization algorithm?DPSO?and partial least squares regression?PLS?,was used to quantify harmful ingredient benzo[a]pyrene in the cigarette mainstream smoke with promising result.Furthermore,the proposed method provided better performance compared to several other chemometric models.?-carotene is a kind of natural pigments with rich nutrition and strong colouring power,rated as Class A excellent nutritious food additive by the Food and Agriculture Organization and the World Health Organization.However,using high dose ?-carotene supplement may have adverse reactions,and even increase the incidence of certain diseases.?-carotene is stable and has high boiling point.High-performance liquid chromatography?HPLC?method is the most commonly used for the ?-carotene detemination.However,there are serious interferences in some complex real samples,leading to large quantitative errors.In order to solve this problem,HPLC-DAD coupled with chemometric ATLD algorithm was applied to measure the content of ?-carotene in milk powder and beverage in chapter 3.With interferents coeluted with the compounds studied,good recoveries of ?-carotene have been obtained owing to the use of ATLD algorithm.It can improve the “chemical separation” with “mathematical separation”.Meanwhile,the estimated contents of ?-carotene in milk powder and beverage obtained by the proposed method were validated by HPLC-MS method.There were no significant difference between the two methods.Fluoroquinolones?FQs?,frequently detected in sewage and surface waters,are a common class of antibioctics.As antibacterial agents,fluoroquinolones are not biodegradable in wastewater treatment plants?WWTPs?.UV / H2O2 advanced oxidation technology?AOT?has been demonstrated to be effective for the removal of refractory organics,therefore,it is necessary to investigate the degration behavior of FQs based on UV / H2O2 advanced oxidation.In chater 4,Fluorescence spectra coupled multi-way chemometrics methods including PARAFAC and quadrilinear PARAFAC algorithm were used to investigate the photochemical degradation of fluoroquinolone ofloxacin?OFL?and danfloxacin?DAN?in water by UV / H2O2.The results demonstrate that the proposed methods can simultaneously quantitate the two antibiotics in real-time,and the method could be used for studying of photochemical kinetics reaction.It is worth noting that PARAFAC algorithm was used to analyze the distribution of OFL and DAN fluorescence spectra at different p H values,and the results were consistent with the reported literature.In addition,PARAFAC algorithm was used to investigate the effect of H2O2 addition on the degradation rate of these two FQs.The analytic method based on fluorescence spectra coupled multi-way chemometric algorithm not only saves sample pretreatment time,but also can be used for optimization of experimental conditions,real-time quantitative analysis of multi-components in complex system.It can be anticipated that the method is potential usseful for monitoring the kinetic process of analytes.Part ?: Protein post-translational modifications site prediction?Chapter 5 to 6?Protein methylation,which plays vital roles in signal transduction and many cellular processes,is one of the most common protein post-translation modifications.Identification of methylation sites is very helpful for understanding the fundamental molecular mechanism of the methylation related biological processes.In silico predictions of methylation sites have emerged to be a power approach for methylation identifying.They also facilitate the performance of downstream characterizations and site-specific investigations.In chapter 5,the author proposed a combination of the pseudo amino acid composition?Pse AAC?and protein chain description for the global feature extraction of protein sequences for methylation prediction.Besides,support vector machine?SVM?was invoked to build the prediction model for methylation sites on the basis of the global features of protein sequence.Meanwhile,a global stochastic optimization technique,particle swarm algorithm?PSO?was employed for effectively searching the optimal parameters in SVM.The prediction accuracy,sensitivity,specificity and Matthew's correlation coefficient values of the independent prediction set are 98.11%,96.23%,100% and 96.30%,respectively.It obviously indicates that our method has sufficient prediction capability in identification of the protein arginine methylation sites.As a comparison,other predictors are also constructed based on different feature extracting and modeling strategies.The results show that the proposed method can greatly improve the performance of arginine methylation sites prediction.Given the importance of lysine-based modifications in controlling protein activity and,in turn,affecting human disease,it follows that characterizing lysine posttranslational modification?PTM?states is fundamental to comprehensively understanding protein function.Nevertheless,it remains likely that at present we have only experimentally detected a small fraction of all lysine modification sites across the commonly studied proteomes.As a result,computational methods aimed at predicting lysine modificationsites have the potential to provide valuable insight to researchers developing hypotheses regarding these modifications.At present,many computational methods have developed for the prediction of the four most widely studied lysine posttranslational modifications?acetylation,methylation,ubiquitination and SUMOylation?.However,most of these methods only predict one type of lysine?K?modification site at one time,rather than simultaneously predict the different types of potentially modified lysine?K?residues on the protein.In chapter 6,we attempt to predict the four most widely studied lysine post-translational modifications?acetylation,methylation,ubiquitination and SUMOylation?simultaneously by using computational method based on amino acid sequence information.Global features of protein sequence as an effective mathematical expression presented in the previous chapter was employed.Besides,SVM was invoked to construct K regression models for classification.This avoids the indivisible phenomenon and greatly reduces the occurrence of classification overlap.The results are very satisfactory.
Keywords/Search Tags:Chemometrics, Multivariate calibration, Wavelet packet transform, Particle swarm optimization, Alternating trilinear decomposition, Quadrilinear PARAFAC, Protein post-translational modifications site prediction, Support vector machine
PDF Full Text Request
Related items