Font Size: a A A

Support Vector Clustering Based On Shared Nearest Neighbor

Posted on:2021-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:D Q ZhuFull Text:PDF
GTID:2428330629452707Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The clustering algorithm is an unsupervised algorithm that classifies data without teacher signals.The advantage of support vector clustering algorithm is that for any shape and number of data sets,theoretically,clusters of any shape can be identified by adjusting the parameters of the kernel function;it is not sensitive to noisy data points,and can avoid the effect of noisy data on the results;the kernel function is used to map the data into the feature space,which solves the problem of linear inseparability.The high cost and low performance of support vector clustering algorithm have become its shortcomings.This paper improves on the original support vector clustering algorithm in the following two parts:1.In the support function part of the objective function training solution,use another method to solve the problem,transform the constraint problem into an unconstrained optimization problem,and use the algorithm framework to utilize the GPU computing function to shorten the training process in time.2.In the division of clustering stage,the original algorithm needs to connect straight lines to randomly pick points between two data points,and find its distance from the center of the sphere in the feature space.This calculation increases the complexity of the algorithm and makes the calculation speed of the algorithm double.In this paper,we introduce the idea of shared nearest neighbors,partition the support vectors in advance,and then divide the remaining data points.The Support Vector Clustering Based on Shared Nearest Neighbor proposed in this paper retains the advantages of the original support vector clustering algorithm.It can be well divided for data sets of any shape;it can process data sets of various shapes by mapping the data set through the kernel function;noisy data points have little effect on the algorithm.However,because of the idea of shared neighbors introduced in the division stage,additional parameters need to be introduced,so the difficulty of adjusting the parameters of the algorithm is increased to a certain extent.The improvement purpose of this paper is to speed up the calculation speed of the algorithm and improve the accuracy of clustering and division,and increase the accuracy.In order to verify the improved algorithm in this paper,we use several sets of artificial data sets with distribution characteristics to make a preliminary check on the feasibility and effectiveness of the algorithm.Then use the real data set to verify the algorithm,and perform horizontal comparison with some other classic clustering algorithms,compare the results of the improved algorithm in this paper,and analyze and summarize.
Keywords/Search Tags:Clustering, Support Vector Clustering, Shared Nearest Neighbor, The division of clustering, kernel function
PDF Full Text Request
Related items