Font Size: a A A

Research On Optimization Methods For Kernel K-means

Posted on:2018-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:S Y FanFull Text:PDF
GTID:2348330539975500Subject:Software Engineering Technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is one of the important contents of data mining research,and it is an important means to identify the internal structure of data.k-means has the advantages of simplicity and efficiency,so it becomes one of the widely studied clustering algorithms.Due to the complex manifold of the real-world data,k-means algorithm based on Euclidean distance can not handle well.As a kernel-based clustering algorithm,kernel k-means algorithm can deal with nonlinear separable data effectively.How to determine the number of clusters in advance,select the kernel function and determine the kernel parameters are the key factors which affect the performance of kernel k-means algorithm.In addition,as the size of data increases,memory space and running time will quickly explode.In this paper,we research the kernel k-means algorithm to improve its performance.The following are the main research contents of this paper:Firstly,in view of blindness to determine the number of clusters and kernel parameters empirically,this paper proposes Self-adaptive Kernel k-means Algorithm based on the Shuffled Frog Leaping Algorithm(SFLA-SAKKM).And an effective index named Kernel Between-Within Proportion(KBWP)for kernel space is designed according to the inherent structure of the data set.The number of clusters and kernel parameters are regarded as the position information of the frog and KBWP is regarded as the fitness.And then shuffled frog leaping algorithm is used to optimize parameters locally and globally until the maximum number of iterations is reached.The number of clusters and kernel parameters corresponding to the maximum fitness are the best.Experiments show that SFLA-SAKKM improves the performance of kernel k-means algorithm.Secondly,this paper studies the problem of kernel parameters selection by using hybrid kernel function in kernel k-means.The hybrid kernel function combines local kernel function with strong learning ability and global kernel function with strong generalization ability.Based on the hybrid kernel function,this paper proposes hybrid kernel k-means.To further solve the problem of high computational complexity and large memory requirement when dealing with large-scale data,we propose Hybrid Kernel k-means for Large Data(HKKM).In this paper,several data points are randomly sampled from the data set.The number of sampling data points is far less than the total number of data points.A kernel matrix is constructed between the sampling data point and all data points.And then follow Nystr?m's idea to approximate the kernel matrix of the whole data set to reduce the complexity of the algorithm.Once again,considering the local and global structure of the data set,this paper proposes a Novel Locally Multiple Kernel k-means based on Similarity Measure(LMKKM)to solve the clustering problem of complex structures or multi-scale data sets.The proposed similarity measure satisfies the requirement of clustering hypothesis,and the relationship between data points is described more rationally after considering the local and global structure.According to local distribution,generate a local scale parameter adaptively for each data point.In order to consider the global structure of the data set at the same time,the density factor is introduced.Experiments show that LMKKM can effectively deal with clustering problems of data sets with complex or multi-scale structures.Finally,we summarize proposed optimized kernel k-means methods and prospect further research of kernel k-means algorithm.
Keywords/Search Tags:Kernel k-means, Shuffled Frog Leaping Algorithm, Clustering Validity Index, Hybrid Kernel Function, Similarity Measure, Clustering Analysis
PDF Full Text Request
Related items