Font Size: a A A

The Research Of Distance Metric And Model Selection In K-Means Clustering And L2-SVM Classification

Posted on:2017-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:X H YangFull Text:PDF
GTID:2348330488990816Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
K-Means clustering algorithm and Support Vector Machine classification algorithm respectively is one of the hot algorithms.Traditional K-Means generally employs Euclidean distance.Yet,Euclidean metric is not a good measure in that it only considers the geometric distance of each data point to the cluster center and assumes each feature of the data space is equally important.Similarly,the L2-SVM algorithm in finding the optimal hyperplane,employing the Euclidean distance as the maximum interval distance measure,which ignores the internal structure of the sample set space.Secondly,there are many parameters in the L2-SVM algorithm,model selection to increase the computational efficiency of the algorithm.So several issues are addressed in this paper,the main research contents are as follows:1?Aiming at the drawback of traditional K-Means,by following the basic idea of locality preserving projection,we first define locality preserving scatter matrix,then introduce a new Mahalanobis distance by using the defined matrix,finally propose a novel K-Means clustering algorithm based on the given Mahalanobis distance.Different from the traditional K-Means algorithm,the proposed method considers fully the intrinsic manifold structure of data.Experimental results show that the proposed method can achieve better clustering accuracy in contrast with the traditional K-Means algorithm.2?The distance metric of decision hyperplane in 2-norm support vector machine is considered.Drawing on the LPP,we first constract locality preserving scatter matrix,then improve the covariance matrix of Mahalanobis distance by using the defined matrix,at last propose a novel method called locality preserving 2-norm support vector machine.It takes the intrinsic manifold structure of data into full consideration.Also,this method considers the data label,which belongs to the supervision method,while LPP belongs to unsupervised method,without considering the category of information data.The experimental results show that compared with the traditional support vector machine,our proposed algorithm has better classification accuracy and generalization ability.3 ?The radius of radius-margin bound is calculated by solving the quadratic programming,it adds the computational overload.In order to solve this problem,we construct a new RM bound which approximates the radius by using the maximum pairwise distance over all points.Then based on new RM bound,the model selection of 2-norm SVM is conducted,and automatically tune parameters by employing the gradient descent algorithm.Finally,the classification accuracy and computational efficiency of the algorithm are discussed through simulation experiments.The experimental results show that the classification accuracy of the proposed algorithm is not significantly changed compared with the model selection based on RM bound,but the computational efficiency is improved at least one fold.
Keywords/Search Tags:K-Means, Locality preserving projection, Mahalanobis distance, 2-norm SVM, Model selection, Radius-margin bound, Gradient descent algorithm
PDF Full Text Request
Related items