Font Size: a A A

Study On The Improvement Of I-nice Clustering Algorithm

Posted on:2021-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:H L QinFull Text:PDF
GTID:2518306110985599Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Currently,the parameter-free clustering algorithms is one of research hotspots in the field of unsupervised learning.The main advantage of parameter-free clustering algorithms is that they don't need to be assigned the necessary input parameters(e.g.,the number of clusters and initial cluster centers)by the researchers before the algorithms are trained on the given data sets.For the real applications,the number of clusters and initial cluster centers are usually unknown.The inappropriate predetermination of input parameters will result in the unsatisfactory clustering results,especially for the large-cluster data sets.Although there exist some nichetargeting methods to optimize the number of clusters and initial cluster centers,they suffer from either the unstable clustering results or the high computation complexity.Therefore,it is very meaningful for the academia and industry to develop the parameter-free clustering algorithm which can automatically and correctly identify the input parameters with the acceptably computational complexity.I-nice is such a parameter-free clustering algorithm which was proposed by Masud et al.in 2018.Though imitating the human beings(i.e.,the observation points)to observe the peaks of mountains,I-nice can automatically identified the number of clusters and selected the initial cluster centers for the given data sets.I-nice includes two different versions,i.e.,single observation point-based I-nice SO and multiple observation points-based I-nice MO.I-nice SO first determines the number of clusters and then identifies the cluster centers,while I-nice MO first identifies the cluster centers and then determines the number of clusters.I-nice uses Gamma mixture models(GMMs)to represent the distributions of distances between the observation points and original data points and k-nearest neighbors method to determine the high-density areas.The number of Gamma components is the number of clusters and the centers of high-density areas are the cluster centers.As the latest parameter-free clustering algorithm,I-nice clustering algorithm innovatively introduces observation points,transforms data from high dimensions into one-dimensional distance values,and solves the gamma mixture model corresponding to distance values based on the expectation maximization algorithm(EM),so that,the number of clusters and the initial cluster center of the original data can be identified.Although the experiment shows the good clustering performance of I-nice,it has two inherent limitations: 1)I-nice SO is sensitive to the position of the observation point,and the unreasonable position of the observation point will make the distance distribution between the observation point and the original data point inaccurate.Therefore,the final clustering result will be wrong;2)The number of nearest neighbors will affect the determination of high-density regions in I-nice MO,and it does not provide a method for determining the value of k,which will also affect the final clustering result.Inspired by Kernel Density Estimation and Density Peaks Mechanism,this paper proposes an improved I-nice algorithm based on the kernel density estimation technique(I-nice KDE)and an improved I-nice algorithm based on the density peak mechanism(I-nice DP).The former uses kernel density estimation technology to automatically set the maximum number of GMM submodels in the I-nice algorithm and use the Minimum Mean Discrepancy to identify more potential cluster centers,thereby reducing the sensitivity of observation point's position and improving the performance of the I-nice algorithm;the latter uses the density peak mechanism to identify the number of clusters and cluster centers in the original data corresponding to each sub-model of the GMM,and uses the distance inflection point method to determine the redundancy of the candidate cluster centers and remove the redundancy,thereby While greatly reducing the complexity of the algorithm,it also improves the generalization ability and robustness of the algorithm.The specific content is: I-nice KDE algorithm uses kernel density estimation technology to optimize the number of algorithm iterations,which not only improves the accuracy of the algorithm,but also reduces the complexity of the algorithm;In addition,it uses the Minimum Mean Discrepancy instead of k nearest neighbors Method to determine the final clustering center,which significantly improves the robustness and effectiveness of the algorithm;the I-nice DP algorithm uses the density peak mechanism to determine the candidate clusters in each Gamma components of the best-fitted GMM,which greatly improves the generalization performance and robustness of the algorithm.In addition,the distance inflection point method is used to automatically determine the distance threshold,which greatly improves the accuracy of the algorithm.In this paper,the feasibility and effectiveness of the two improved algorithms are experimentally verified on the synthetic dataset and the real dataset.The experimental results show that the two methods not only significantly improve the generalization performance of Inice,but also show strong robustness.At the same time,compared with the existing classical clustering algorithms,such as DBSCAN,BIRTH,both I-nice KDE and I-nice DP have achieved better generalization performance and stability.
Keywords/Search Tags:I-nice Algorithms, Parameter-free Clustering Algorithms, Gamma Mixture Model, Kernel Density Estimation, Density Peaks Mechanism
PDF Full Text Request
Related items