| Surface enhanced laser desorption/ionization time-of-flight mass spectrometry(SELDI-TOF MS) combined with bioinformatics methods for detection and analysis candetect cancer "fingerprint" and then create a diagnostic model to help get insights into thedifference between the health and the disease states, identify risk factors earlier so thatenhance the ability of diagnosis. However, there is no consensus on how to deal withproteomics mass spectra. The data processing problem has become one of the hot issues incancer diagnosis on laboratory research and clinical application.As statistical algorithms used to proteomics MS data analysis are too complicated forexperts in the actual fields to understand, based on thinking of visual data analysis, thework of this thesis focuses on three basic problems: preprocessing of proteomic massspectra, feature selection and extraction, and classifier design and evaluation based onproteomic patterns. Visualization for data presentation, feature extraction and classificationhelp medical experts and biologists to explore and discover the wealth of knowledgehidden in the proteomics MS data.Firstly, classical proteomics MS data preprocessing methods aim to reduce systemerrors, improve data quality and enhance data interpretability are investigated, includingdata reduction, spectral smoothing, baseline correction, standardization, peak extraction andquantification, peak alignment and so on. Then, the flowchart of data preprocess in thisthesis is determined as wavelet denoise, baseline correction, peak extraction and peakalignment.Secondly, two types of cancer "fingerprint" extraction method are proposed: radarchart presentation and graph feature extraction for multivariate interval information; featureselection from cross imaging of all sample or mean spectra. Graph feature extraction isbased on mathematical mapping model of high-dimensional data to multivariate graph andthe thinking of maximizing local information, to integrate mass spectra with multivariategraph presentation for dimensionality reduction. Features that can well distinguish thecancer group and the control group are selected from binary image obtained from crossimaging of data cube. The energy curve can be calculated for visualizing the difference between two classes.Finally, the radial coordinate mapping is combined with machine learning algorithmsto achieve visual classification. The optimized two-dimensional radial coordinate mappingmodel combined with support vector machine classifier can directly reveal the relationshipbetween class and features in high-dimensional data set. The model can be extended tothree-dimensional mapping that display the similarity between class and within classeffectively and can find the hidden sub-class. The multivariate graph is seen as theinformation exchange and flow of data, experts and computer so as to classify cancerpatterns with multiple biomarkers. Experiments on international datasets of proteomicsmass spectra show that the proposed method is efficient and obtains satisfactory accuracy,sensitivity and specificity. |