Font Size: a A A

Research Of Key Technologies For Support Vector Machine And Their Applications On Human Activity Recognition

Posted on:2016-02-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y K YaoFull Text:PDF
GTID:1228330461971040Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Essentially. Support Vector Machine (SVM) is of a Convex Quadratic Optimization Problem with linear inequal constraints. Support Vector Machine can separate binary class data from one class to another through searching for the optimal separating hyperplane, and the distances from the nearest point(s) to the aquired optimal hyperplane in both separated classes are equal. Support Vector Machine can map the original input data into the so-called Hilbert Feature Space by using the Kernel Function(s), and it strives to find the optimal separating hyperplane in the newly mapped feature space when faced with inlinear data. Support Vector Machine is fully decided on the Support Vectors, which are most informative tumples always locate in the bordland of each class of data. The core problems of Support Vector Machine exist in:for the reason of the necessity of computation process for kernel matrix, computations for parametres optimization of kernel function(s), and SVM, the training and predicting speed of even the fartest SVM can be slowly; when used for classification of the imbalanced data, the scarce of the data tumples of the minority class can lead to the bias of the separating hyperplane to the minority class side, thus can seriously bring down the generation ability of SVM for current unseen data of the minority class; Support Vector Machine is of a binary classification algorithm in essence, it cannot be directly used for classification of multi-class data. Those mentioned problems would be particularly prominent when SVM is used for classification of large-scale class imbalanced data.To try to address the above problems, we have studied the grid search based parametres optimization mechanism in this dissertation; we have explored the ensemble learning based SVM on the basis of the fusion of data preprocessing and features extraction; some study work about imbalanced data classification with SVM has been done; the study for multi-class SVM ensemble learning algorithm, which was used on the application occasion of Human Activity Recognition(multi-class imbalanced data) has been explored.The main contributions in this dissertation are summarized as follows:(1) An SVM classification algorithm named PMSVM is proposed which is based on multi-level grid search method, First, search optimization values with a coarse granularity step size in a comparatively larger search space, then substitute the coarse granularity with a comparatively fine granularity step value, meanwhile reduce the search space according to the values of the aquired temporary optimization parametres, all automatically, and begin new interation. Repeat the process ahead till find the optimal parametre values. Grid search is of a typical greedy algorithm, to shorten search space and adjust the step size can extremely advance the search efficiency. After the preprocessing process with system norminization and Principal Component Analysis (PCA), we implement the PMSVM algorithm by the fusion of the proposed multi-level grid search method in this dissertation, also the accuracy and efficiency of PMSVM is proved through experiments.(2) An ensemble learning based Support Vector Classification algorithm named PEnSVM is proposed. We construct different base SVM classifiers on the basis of the normalized holdout sampled data after PCA processes with different PCA thresholds which can adjust automatically. Traditional ensemble learning is inclined to generate final learning model through aggregating homogeneous base classifiers, in this dissertation, the final ensemble learning model is constructed based on those inhomogeneous weak classifiers which are generated on those data sets after the PCA process with different PCA thresholds, and those weak classifers are aggregated by means of the Bagging method, which using the so-called majority voting strategy. Experimental results on five different UCI benchmark data sets have proved the validity and robustness of our PEnSVM.(3) A SMOTE over-sampling technology based SVM classification algorithm for imbalced data is proposed, which is named KMSSVM, and is constructed on the basis of KNN Graph based Minimum Spanning Tree. First, a KNN Graph is constructed on the positive tumples, namely minority tumples, then generate a MST tree on the basis of the KNN Graph. Further, find the K nearest neighbours of those leaf nodes of the MST, and insert new nodes using SMOTE over sampling method between those leaf nodes and their randomly selected neighbours with the same class lable by SMOTE till meet the necessity of balancing positive-negative data. The leaf nodes of MST likely to be nodes near the margin area of positive data, so insert new "nodes" between them and their neighbours with our strategy can help to improve generation ability of SVM for imbalanced data, as is outstanding than traditional SMOTE based methods. Experiments on three imbalanced UCI benchmark data sets with different imbalance degrees have proved that KMSSVM is an effective classification algorithm for imbalanced data.(4) An multi-class SVM classification ensemble learning algorithm for large-scale imbalanced data is presented, which is named BEnSVM. The original input data is divided into lots of tiny subsets by using stratified bootstrap sub-sampling technology. The base muli-class SVM classifiers are constructed on those sampled tiny subsets through the One-Versus-One mechanism. The generated base muli-class SVM classifiers are used to aggregated the ensemble learning SVM, which is called BEnSVM in our dissertation, according to the so-called Majority Voting strategy. BEnSVM is appropriate for classification of large scale imbalanced data. The initial base multi-class SVM calssifiers are constructed on those tiny subsets, and the small scales of tiny subsets reduce the computation complexity of the corresponding dual form optimization problem of SVM in several orders of magnitude, besides, the weak two-class SVM classifiers of those multi-class SVMS are implemented with the parallel processing technique of multithreading, which reduces the computation complexity of our algorithm even further. A real data set with 5 class labels and 165,633 tumples, which is a multi-class class imbalance data, is used to test BEnSVM, and experimental rusults show that BEnSVM is a good multi-class SVM classification algorithm for multi-class class imbalance data that with lower computation complexity and better classification performances compared with several existing classical classification algorithms.
Keywords/Search Tags:Statistical Learning Theory, Support Vector Machine, Classification, Kernel Function, Grid Search, Ensemble Learning, Imbalanced Learning, Human Activity Recognition
PDF Full Text Request
Related items