Font Size: a A A

Research On Clustering Identification Method Based On Support Vector Partition

Posted on:2020-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:Q PengFull Text:PDF
GTID:2428330575977676Subject:Computer application technology
Abstract/Summary:
In today's big data era,the value of information is reflected in all aspects.With the gradual accumulation of data,all organizational structures hope to obtain usable and useful information from this huge information base.Therefore,data mining plays an important role in all fields.As one of the important tasks of data mining,cluster analysis can dig out the relationship between objects and objects in massive data.Many researchers have conducted research on it.As an important contour clustering algorithm,support vector clustering has many advantages over other clustering algorithms.First,structured data can be processed by using kernel theory.Secondly,support vector clustering is an unsupervised learning algorithm,without knowing the number of clusters in advance.This advantage can extend the use of the algorithm.Thirdly,the algorithm does not require the shape of the cluster and the number of clusters,which can identify clusters of arbitrary shapes.Fourthly,support vector clustering can identify noise points by introducing soft edge variables and eliminate the impact of noise points on clustering.However,the calculation amount in the cluster partitioning stage is too large,and the running time of the algorithm is high.At the same time,it has random operation,which makes the result of the division is easily affected.In addition,the clustering results of support vector clustering have a large dependence on the parameter soft edge constant C and the kernel width coefficient q.It is just very time consuming to find the optimal parameters.This paper introduces a cluster identification method based on the geometric properties of sample sets in feature space.It includes two models,cluster sphere partition and convergence partition.The cluster sphere partition model is to obtain the support vector set in the feature space,and then use the jump function to divide the support vector.Cluster sphere formed by support vectors of different clusters,wrapping most data points belonging to the cluster,reducing the error division of data points.The convergence partition model provides more cluster information for support vector partitioning by selecting the non-support vector points with the largest kernel function of the support vector as similar support vector points.The mean value of the kernel function between the support vector and the similar support vector is used as a threshold to divide support vectors.Subsequently,by establishing an initial array,the cluster is marked by gradual convergence from the boundary of the cluster.The algorithm solves the problem of high time complexity and reduces the probability that the sample points are incorrectly clustered.For the algorithm,the simulation data set is used to display the clustering results,and the classical data set is used for experimental simulation.And compared with a variety of algorithms with different cluster evaluation criteria.Experiments show that the proposed algorithm has a good clustering effect.At the same time,the clustering time of the algorithm is tested.The experiment proves that the algorithm reduces the time complexity of the support vector clustering algorithm.
Keywords/Search Tags:Cluster analysis, support vector partitioning, kernel width coefficient, cluster identification
Related items