Font Size: a A A

Protein Mass Spectrometry Data Analysis Model And Its Application

Posted on:2019-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:H TaoFull Text:PDF
GTID:2310330542973610Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Cancer is a disease caused by disorders in the mechanisms that control cell sorting and apoptosis.The key to prevent cancer death is to improve the early diagnosis of cancer,and in the early diagnosis and treatment of cancer,the most important is to find the potential tumor biomarkers.The development of proteomics provides the possibility of early detection and diagnosis of cancer.Among them,proteomics data analysis based on mass spectrometry can provide a powerful tool for the early diagnosis of tumors.It is very important to identify the biomarkers of malignant tumors by digging out the markers with sample differences or finding the characteristics that can reflect the differences between the samples from the high dimensional original data of tumor protein profiles.In this paper,an efficient method of dimensionality reduction is applied to low-sample and high-dimension MS data to extract MS data with high reliability for classification and prediction.It is converted to a protein sequence according to its spectral abundance.Based on the construction of quasi-star graph of protein mass spectrometry data,the numerical characterization of protein mass spectrometry data was proposed by using the topological indices of two types of quasi-star graphs,and a mathematical characterization model of protein spectra was established.Further,a binary classification model was built based on the obtained topological index by SVM.First of all,we use different normalization methods to standardize the data and use different SVM kernel functions to classify the data.The results show that the classification of Gaussian kernel function at[0,1]normalization,[-1,1]normalized linear kernel function is selected,the accuracy of classification and prediction are 97.67%,the sensitivity is 98.75%,the specificity is 96.00%.Compared with other methods,this method has a higher accuracy.Then this method is generalized to three-class dataset.The SVM three-class classifier is constructed,the correctness rate is 64.8%.The method of this article has a very good scalability and can be extended to analyze other dichotomous data.
Keywords/Search Tags:Proteomics, SELDI-TOF-MS, Star-graph, feature extraction, SVM, the three classification
PDF Full Text Request
Related items