Font Size: a A A

Research On The Method Of Data Normalization For Improving SVM Training Efficiency

Posted on:2018-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:R Z TangFull Text:PDF
GTID:2348330518968382Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Support vector machine(SVM)is a machine learning method based on the theory of structural risk minimization and VC dimension theory.In recent decades,its classification ability has been widely used in many fields.It is still one of the most popular field of machine learning research.Many domestic and foreign scholars are committed to improve the efficiency of SVM training.Data normalization is the process of data preprocessing for training support vector machine.The commonly used normalization methods are [-1,+1],N(0,1)and other methods.However,there is no literature on the scientific basis of these commonly used normalization methods.In this paper,we study the operation mechanism of sequential minimum optimization algorithm in SVM,and we find that the Gauss kernel function will be affected by the value of the sample data.The data attribute value is too large or too small will reduce the participation of Gauss kernel function.Data normalization can be limited to a certain range of data,so that it can better match the Gauss core radius,so as to avoid the optimal classification hyperplane too rugged.In this paper,the empirical study on the inherent mechanism of data normalization,normalization and non-normalization of training efficiency and the ability to predict the impact of the model were explored and studied.We choose the standard data sets,the original is not normalized,different methods of normalization,artificial non normalization,optional data attribute of the data were analyzed by SVM training,and record the objective function value change trend,training time,model test and k-CV performance.The research results are summarized as follows:(1)In the traditional sequential minimal optimization algorithm(SMO),we summed up the expression value of the objective function and its change rate,the algorithm is programmed using C++11 technology to realize the calculation of training time and testing accuracy.In this paper,we analyze the typical documents of Gauss kernel function in the sequential minimal optimization algorithm,and determine the optimal value of the radius of the kernel and the precision value of KKT.The experimental results show that the determined value and K value can achieve the best generalization ability,and through the analysis of the data obtained according to the curves of the conclusion: it is possible to improve the training efficiency through SVM data preprocessing.(2)In this paper,we deeply study the methods of data preprocessing,and realize three kinds of normalization methods of SVM,which are the most value normalization,the median normalization and the standard score normalization.The experimental results show that the data normalization method can solve the problem of Gauss kernel function kernel radius,so that the Gauss kernel function can be applied to the SVM classification.(3)We used three different normalization methods to preprocess the standard experimental data set and designed a variety of experimental methods,The k-CV method is used to record and compare the training time and test accuracy.Based on the analysis of the change of SVM training efficiency,the data-normalization can improve the training efficiency.That is,the value of each data attribute is controlled in the conventional numerical range.(4)Through the analysis of the influence of the training efficiency of SVM and the comparison of the difference of the classification ability,we analyzed the limited range of data normalization to improve the training efficiency of SVM.That is,the value of the data attributes are controlled in the conventional comparable numerical range,such as: [-0.5,+0.5][-5,+5],N(0,1)~ N(0,5),etc..A large number of experiments show that the data normalization can effectively improve the training efficiency of SVM.This paper provides a scientific basis for the normalization of SVM and machine learning algorithms.
Keywords/Search Tags:support vector machine(SVM), Data normalization, Gaussian kernel function, Sequential minimal optimization algorithm, Cross validation, The training efficiency
PDF Full Text Request
Related items