Font Size: a A A

A Series Spectral Preprocessing Model Based On Support Vector Machine

Posted on:2018-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:D D XieFull Text:PDF
GTID:2350330536956260Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The analytical method based on tandem mass spectrum plays a leading role in the protein identification methods.With the development of technology,the mass spectrometry technology is becoming more and more mature.Mass spectra can be generated in a very short time.However,almost every single spectrum contains more or less noise.On the one hand,the presence of noise will waste the time for searching database and eventually leads to increase the time of protein identification.On the other hand,the existence of noise will cause interference on the result of mass spectrometry,and increase the false positive or false negatives in the identification of peptides and proteins.In order to solve this problem,all kinds of mass spectrometry denoising methods have sprung up.The spectral denoising methods are intended to remove the noise peaks while retain the signal peaks.The traditional denoising method is based on the threshold: the peak in the spectrum below the set threshold will be discarded.Anther method is to choose Top X peaks as signal peals,which are ordered by intensity,and the X can be set according to the requirements;In addition,there is a method to select the top of the peak which took X Da as the unit and X can also be set according to the actual situation of the spectrum.The above methods simply considered the intensity of the peak,while ignoring the other features hidden among the peaks.It is inevitable that the effective peak will be filtered because of low intensity.Machine learning is a popular research direction in recent years.Its methods include: support vector machine,neural network,Bayesian and so on.The application of machine learning in mass spectrometry is a new field and there are few articles published in this field.In this paper,by analyzing the several kinds of machine learning,considering the application,we proposed a new method based on the support vector machine.Support vector machine(SVM)is a machine learning method based on statistical learning.SVM is mainly used to solve the problems of two categories.Since the number of negative data is more than positive data in the tandem mass spectrometry,how to deal with the unbalanced data is also a place to consider when building a model.Considering the application of the unbalanced data processing methods such as oversampling,under-sampling and cost-sensitive etc.,we finally chose the under-sampling method.Based on the characteristics of mass spectrometry and machine learning,we chose 25 features to build models such as neutral loss,peak intensity and isotope etc.The model can predict the signal peak and noise peak.According to the prediction result,we can remove the noise peak to achieve mass spectrometry filtering.In order to test the effect of the model,the human sample and the iTRAQ type data set has be tested and trained.we examined the two directions of the self-training and the component training separately and the data of the same type in different experimental results are analyzed.The results by Mascot showed that our model can efficiently predict the effective peak and noise peak,and improved the spectral score and the number of peptide and protein identification.
Keywords/Search Tags:Proteomics, Tandem mass spectrometry, Peak-preprocess, De-noise peak, Support vector machine
PDF Full Text Request
Related items