Font Size: a A A

Cancer Diagnosis By Using Support Vector Machine

Posted on:2008-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:Q F YuanFull Text:PDF
GTID:2144360215491090Subject:Materials Physics and Chemistry
Abstract/Summary:PDF Full Text Request
Statistical Learning Theory (SLT) proposed by Vapnik and co-workers from the AT & T Bell is a statistics theory for the analysis of a small-sample database. Based on STL and structural risk minimization, Support Vector Machine (SVM) is a supervised machine learning approach and was recognized as a statistical learning apotheosis for the small-sample database, SVM has shown its excellent learning and generalization ability and has been extensively employed in many areas.In this study, the features of concentrations of 6 elements in human blood, breast fine needle aspiration cytology and genes of breast cancer patient were used to implement cancer diagnosis and evaluate the prognosis of breast cancer patient by using recognition approaches (such as K-Nearest Neighbor, Probabilistic Neural Network, Decision Tree, SVM). The influence of different feature selection methods on the classification accuracy was analyzed and discussed. The classification performance of SVM was compared with those of other classifiers.The outline of this thesis is showed as below:①The current methods of feature selection and extraction for pattern recognition were reviewed. The advantages and disadvantages of several algorithms including Signal-to-Noise Ratio (SNR), Entropy Criterion (EC), Genetic Algorithm (GA), Principal Component Analysis (PCA), Independent Component Analysis (ICA), Particle Swarm Optimization (PSO) and Simulation Annealing (SA), were introduced.②The classification principles of popular classifiers were reviewed briefly, such as Bayes Classifier (BC), K-Nearest Neighbor (K-NN), Decision Tree (DT), Probabilistic Neural Network (PNN) and Artificial Neural Networks (ANN). The principle, algorithm, implementation, development of SVM and its application were described in detail.③Several classifiers and feature optimization algorithm were employed to implement cancer diagnosis by using the concentrations of 6 elements (Zn, Ba, Ca, Mg, Cu, Se) in human blood, and the influence of different feature selection and extraction methods on the classification accuracy were also analyzed. Thereinto, the classification accuracies of K-NN (Based on SNR), PNN (Based on SNR), DT (Based on EC), SVM (Based on GA) achieved 95.95%, 97.29%, 91.89% and 98.64%, respectively.④Several classifiers and feature optimization algorithm were also applied to perform breast cancer diagnosis by using breast fine needle aspiration cytology data. The classification accuracies of K-NN (Based on SNR), PNN (Based on SNR), SVM (Based on GA) achieved 96.09%, 95.08% and 96.24%, respectivety.⑤Genes of breast cancer patient were employed to evaluate the prognosis via 3 classifiers (K-NN, PNN and SVM), the effect of different feature selection and extraction methods on the classification performance were discussed. Thereinto, the classification accuracies of K-NN (Based on SNR), PNN (Based on SNR), SVM (Based on SNR) 83.39%, 86.10%, and 88.81%, respectivety.The studies of above demonstrated, the accuracy of SVM was superior to those of other classifiers including K-NN, PNN and DT. The results suggest that SVM may be further developed to be a potential application tool for clinical assistant cancer diagnosis and prognostic evaluation.
Keywords/Search Tags:Support Vector Machine, Feature Selection, Feature Extraction, Cancer, Computer-Aided Diagnosis, Prognosis, Prediction
PDF Full Text Request
Related items