Font Size: a A A

Research On Some Problesm Of Support Vector Machine Learing Algorithm

Posted on:2011-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:T T ChangFull Text:PDF
GTID:1118330338450088Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Support Vector Machine (SVM) is a novel data mining method based on Statistical Learning Theory (SLT). With the development of SLT, SVM rapidly develop in theory and application. It has properties of good generalization ability, global optimal solution, and demonstrated a number of unique advantages in solving the small size problem, non-linear problem, and high-dimension data problem. SVM is successfully used in prediction, data fitting, comprehensive evaluation, pattern recognition, and many other problems.Recently, researches on theory and application of SVM are at a stage of rapid development. The performance, training speed, and scope of the data of the classifier are always the goals of the researchers.This paper mainly focus on researches on SVM classifier, SVM clustering, SVM ensemble learning and kernel learning. The purpose of our study is to improve the performance of the SVM. The paper is organized as follows.1. Research on support vector machine classification problems. A multiple kernel support vector machine based on grouped features is proposed for heterogeneous data classification problem. Different sources of data is splitted into disconnected groups, different kernels are used based on these groups. The convex combination of these kernel functions serves as a new kernel function, and then derived the multiple kernels SVM as a semidefinite programming. The experimental result shows that the proposed algorithm can improve the performance of the classifier effectivly.2. Research on support vector machine clustering. A multi-class support vector machine clustering (Multi-Class SVMC) is proposed, since support vector machine clustering (SVMC) just can cluster the data into two clusters. In this method, One-Against-All (OAA) strategy is used to generalize the SVMC into multiple clusters problem. Furthermore, a hierarchical support vector machine clustering (Hierarchical SVMC) is proposed, since the Multi-Class SVMC needs to predefine the number of clusters. The data are labeled into two classes randomly, pruning the class label via each iteration step, until the labels remain constant. The data are classified into two clusters, then the SVMC is applied on these two clusters, and the like, until the stop criterion is satisfied. The experimental results show that the Multi-Class SVMC and the Hierarchical SVMC are easy to implement, suitable for large scale data, and have good clustering quality.3. Research on large scale support vector machine. A local diversity adaboost SVM ensemble algorithm is proposed for large scale data problem. The data is splitted into several blocks, AdaBoost SVM is used in each block. First, the strong classifier SVM is weakened via adjusting kernel parameter and penalty factor of the SVM. Second, AdaBoost is applied in this weakened SVM, and local models are integrated via majority vote method at last. The experimental results show that local diversity AdaBoost SVM can deal with large scale data classification problem efficiently without degrading the performance of the classifer.4. Research on support vector machine ensemble learning. To·improve the accuracy and the diversity of the component classifiers, a support vector machine ensemble learning with multimodal perturbation (MP) is proposed. Several subsets are obtained via bootstrapping method, MP is applied in each subset. First is the feature perturbation, in this stage, Principal Component Analysis (PCA) is adopted for feature selection, and this feature perturbation aim at increasing the diversity among the component classifiers. Second is the parameter perturbation, in the model training step, different paramters are used in each subset with automatic model selection. Parameter perturbation aim at increasing the accuracy of the component classifiers. Finally is the output perturbation. To increase the effectiveness of the high accuracy component classifiers, the output is weighted with the accuracy of the component classifiers. Compared with Single Modal Perturbation (SP), the MP can improve the performance of the SVM ensemble classifier.5. Research on reduced support vector machine classification problem. The reduced dataset selection is critical to the performance of the reduced support vector machine. For the problem of reduced dataset selection, a rectangle kernel SVM with Density-Based Spatial Clustering of Applications with Noise algorithm (DBSCAN-RSVM) is proposed. In this algorithm, the data are clustered with DBSCAN clustering algorithm, and the core data of the DBSCAN is adopted as reduced dataset in the reduced support vector machine. A rectangle kernel SVM is obtained, which can be derived as a smooth support vector machine (SSVM). Compared with the traditional reduced support vector machine, the K-means reduced support vector machine and the traditional SVM, the proposed DBSCAN-RSVM has better performance.
Keywords/Search Tags:statistical learning theory, support vector machine, multiple kernel learning, smooth support vector machine, reduced support vector machine, semi-definite programming, multimodal perturbation, support vector machine clustering, ensemble learning
PDF Full Text Request
Related items