Research On Optimization Methods For Kernel K-means

Posted on:2018-10-06

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Fan

Full Text:PDF

GTID:2348330539975500

Subject:Software Engineering Technology

Abstract/Summary:

PDF Full Text Request

Clustering analysis is one of the important contents of data mining research,and it is an important means to identify the internal structure of data.k-means has the advantages of simplicity and efficiency,so it becomes one of the widely studied clustering algorithms.Due to the complex manifold of the real-world data,k-means algorithm based on Euclidean distance can not handle well.As a kernel-based clustering algorithm,kernel k-means algorithm can deal with nonlinear separable data effectively.How to determine the number of clusters in advance,select the kernel function and determine the kernel parameters are the key factors which affect the performance of kernel k-means algorithm.In addition,as the size of data increases,memory space and running time will quickly explode.In this paper,we research the kernel k-means algorithm to improve its performance.The following are the main research contents of this paper:Firstly,in view of blindness to determine the number of clusters and kernel parameters empirically,this paper proposes Self-adaptive Kernel k-means Algorithm based on the Shuffled Frog Leaping Algorithm(SFLA-SAKKM).And an effective index named Kernel Between-Within Proportion(KBWP)for kernel space is designed according to the inherent structure of the data set.The number of clusters and kernel parameters are regarded as the position information of the frog and KBWP is regarded as the fitness.And then shuffled frog leaping algorithm is used to optimize parameters locally and globally until the maximum number of iterations is reached.The number of clusters and kernel parameters corresponding to the maximum fitness are the best.Experiments show that SFLA-SAKKM improves the performance of kernel k-means algorithm.Secondly,this paper studies the problem of kernel parameters selection by using hybrid kernel function in kernel k-means.The hybrid kernel function combines local kernel function with strong learning ability and global kernel function with strong generalization ability.Based on the hybrid kernel function,this paper proposes hybrid kernel k-means.To further solve the problem of high computational complexity and large memory requirement when dealing with large-scale data,we propose Hybrid Kernel k-means for Large Data(HKKM).In this paper,several data points are randomly sampled from the data set.The number of sampling data points is far less than the total number of data points.A kernel matrix is constructed between the sampling data point and all data points.And then follow Nystr?m's idea to approximate the kernel matrix of the whole data set to reduce the complexity of the algorithm.Once again,considering the local and global structure of the data set,this paper proposes a Novel Locally Multiple Kernel k-means based on Similarity Measure(LMKKM)to solve the clustering problem of complex structures or multi-scale data sets.The proposed similarity measure satisfies the requirement of clustering hypothesis,and the relationship between data points is described more rationally after considering the local and global structure.According to local distribution,generate a local scale parameter adaptively for each data point.In order to consider the global structure of the data set at the same time,the density factor is introduced.Experiments show that LMKKM can effectively deal with clustering problems of data sets with complex or multi-scale structures.Finally,we summarize proposed optimized kernel k-means methods and prospect further research of kernel k-means algorithm.

Keywords/Search Tags:

Kernel k-means, Shuffled Frog Leaping Algorithm, Clustering Validity Index, Hybrid Kernel Function, Similarity Measure, Clustering Analysis

PDF Full Text Request

Related items

1	The Research Of Fuzzy Clustering Algorithm Of Data Mining Based On Shuffled Frog Leaping Algorithm
2	Analysis And Research Of K-means Algorithm Based On The Modified Shuffled Frog Leaping Algorithm
3	Research On Weighted Kernel FCM Algorithm With Double Variables And Its Validity Evaluation
4	Collaborative Filtering Recommendation Based On Fuzzy Clustering And Improved Shuffled Frog Leaping Algorithm
5	Research On Fuzzy C-Mean Clustering Algorithm Based On Particle Swarm Optimization And Shuffled Frog Leaping Algorithm
6	Hrrp Recognition Method Based On Kernel Clustering
7	Study On Improvement Of Shuffled Frog Leaping Algorithm And Its Applications In Rotating Machinery Fault Diagnosis
8	Improvement And Application Of K-means Clustering Algorithm Based On Hybrid Frog Leaping Algorithm
9	Clustering Analysis Of K-means Based On Improved Genetic Algorithm
10	Improvement And Application Research Of Shuffled Frog Leaping Algorithm