Font Size: a A A

Mass Spectrum Data Based Bacteria Classification

Posted on:2019-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:J BaiFull Text:PDF
GTID:2371330566979991Subject:Statistics
Abstract/Summary:PDF Full Text Request
At present,there are more and more diseases caused by bacteria.Rapidly and accurately identifying pathogenic bacteria will have important practical significance for the timely prevention or treatment of related diseases.The proteome is the material basis of life,and study of it can implement species identification and disease pathological analysis.Mass spectrometry(MS)is an important analytical tool for proteomics analysis,because it can analyze a large number of protein molecules in parallel and completely.However,in the current medical field,for the purpose of bacterial identification based on the subsequent analysis of acquired mass spectrum data,some additional commercial instruments must be used(such as CliProTools,etc.).So far,the Medical Examination Center of the First Affiliated Hospital of Chongqing Medical University(hereinafter referred as the Center)has obtained a large amount of mass spectrum data of clinical bacterial samples.However,due to the limited functionality and high cost of the instrument supporting software,the Center hopes to cooperate with our project to find a suitable data analysis method to mine the acquired clinical MS data fully.Ultimately,achieving bacterial identification and assisting medical decision-making.However,the mass spectrum data have a problem of high-dimensionality and small sample size(HDSS),which pose a challenge for its analysis and application.In addition,the clinically data even affected by the instrumental errors and experiment operation difference,as well as the small instrument range used.The characteristics of the acquired mass spectrum data have certain deviations and the failure to include biomarkers.Therefore,in the current situation where there is little research on the analysis of clinical complex biological mass spectrum,it is urgent to design a more effective and universal classification method for microbial classification based on clinical mass spectrum data.To accomplish these goals,we designed and implemented the following bacterial classification method for clinical mass spectrum data:In the pre-processing stage,a special binning-sliding(BS)pre-processing method is used to process the clinical MS data to remove the systematic errors as much as possible.Then,in the feature selection stage,the generalized feature selection includes the metric space and the conversion space selecting strategies.This paper combines the two selection strategies:(1)Because the clinical MS data contains a lot of noise,the wavelet transform modulus maxima(WTMM)method was used to extract the implicit mass spectrum characteristics through different correlation propagation laws at each level of the representative signal and noise.(2)At the same time,due to the statistical characteristics of mass spectrum data,an improved genetic algorithm based on t-test was designed for wrapper feature selection.In the improved genetic algorithm,the T test statistic is used as a priori information for population initialization,and the classification performance is directly used as the fitness evaluation index of the genetic algorithm.at the end of the whole method,Support Vector Machine(SVM)classifier is used to classify microbes based on the extracted biological mass spectral features.In this paper,we conducted an experiment with the clinical mass spectrum data of staphylococcus aureus(S.aureus)provided by the Center,and to distinguish methicillinresistant S.aureus(MRSA)and methicillin-sensitive S.aureus(MSSA)is the goal.Several cross-validation test results for several comparative experiments show that,the newly proposed method raises the classification accuracy from 0.63 to 0.82,and its sensitivity and specificity are relatively balanced,all around 0.8.At the same time,the newly designed method has a stable performance with a standard deviation of accuracy as low as 0.008.Therefore,the new classification method of bacterial,combining wavelet and genetic algorithm to select mass spectral features,can be used to effectively identify MRSA.Moreover,the method is effective,fault-tolerant and universal,it can be used to analyze and process the mass spectrum data with various characteristics,and reveal the type differences reflected by the mass spectrum,and achieve the auxiliary functions of microbial identification or disease diagnosis.
Keywords/Search Tags:Proteomics, MS, MRSA, WTMM, Wrapper feature selection
PDF Full Text Request
Related items