Font Size: a A A

Protein Mass Spectrometry Data Mining Method

Posted on:2012-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:C ShiFull Text:PDF
GTID:2218330335486279Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Accurate diagnosis of cancers is still a medical challenge till now, especially to inchoate cancers. To overcome this challenge, researchers introduced a method called Protein Mass Spectrometry Data Analysis, through which cancers can be diagnosed by analyzing samples of protein mass spectrometry data. The detailed process is first extract key features of sample to train the classifier, and then classify test samples with the classifier. However, protein mass spectrometry data has features like huge volume and the presence of noise. These features highly increase the data analysis complexity and make classifying protein mass spectrometry data difficult. In this thesis, protein mass spectrometry data of ovary cancer and pancreatic are collected and analyzed by various methods. The feature selection applies T-TEST, SOM(self-organization mapping net) and PCA(principal components analysis) while the sample classifier applies SVM(support vector machine)and PNN(probabilistic neural network).Five types of protein mass spectrometry data classification is studied in this thesis with various feature selection methods and classifiers. The first three methods apply the same classifier SVM but different feature selection methods. Feature selection method in the first type uses T-TEST, second SOM neural network and third T-TEST and 2nd PCA. Fourth classification method applies T-TEST and MSDI (Maximum Significant Difference and Independence) algorithm in feature selection method and classifier uses PNN. Fifth applies T-TEST and MSDSRI (Maximum Significant Difference and Square Root of Independence) algorithm which is proposed in this thesis in feature selection method and classifier uses PNN.Detailed analysis and in-depth research are done in this paper based on the above-mentioned protein mass spectrometry data classification methods. And the conclusion is that sample recognition rate is not only related to the feature selection method applied, but also related to the classifier and the feature numbers ultimately used to train the classifier. When using SVM as classifier, feature selection using T-TEST method performs better than using SOM neural network, and 2nd PCA is better than PCA. When applies PNN as classifier, feature selection using MSDSRI algorithm performs better than MSDI algorithm. Taken the overall classification performance as consideration, classification methods based on MSDI algorithm and PNN performs better than that based on 2nd PCA and SVM. In the examination of ovarian and pancreatic fragment biopsy,99.498% and 99.722% recognition rate can be achieved respectively based on MSDSRI algorithm and PNN.
Keywords/Search Tags:proteomics, mass spectrometry, feature selection, neural network, PNN, SVM, SOM, 2nd PCA
PDF Full Text Request
Related items