Font Size: a A A

Research On Ensemble-Initialized K-Means Clustering Algorithms

Posted on:2020-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:S S XuFull Text:PDF
GTID:2518305981952849Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology and the emergence of largescale data,data mining has recently become an important area in the computer science research.The data mining research involves a number of different directions,such as pattern mining,classification,clustering and topic learning.Among these research directions,clustering analysis has been a popular direction in recent years,whose purpose is to group a set of data points(with no labels)into a certain number of clusters,such that the data points in the same cluster have high similarity and the data points in different clusters have low similarity.There are several main categories in the existing clustering algorithms,such as hierarchical clustering algorithms,partitional clustering algorithms,grid based clustering algorithms and density based clustering algorithms.Among these algorithms,k-means clustering is one of the most classical partitional clustering algorithms.Due to its simplicity and efficiency,k-means clusterings has been successfully applied in many areas.However,the conventional k-means clustering algorithm needs select a set of initial cluster centers and may be trapped by some outliers,which may lead to degraded clustering performance.The instability of cluster center initialization has been a main limitation of the conventional kmeans clustering algorithm,and is also a major factor that should be considered in improving its performance.To address the initialization problem of the conventional k-means algorithm,this thesis proposes a novel ensemble-initialized k-means clustering algorithm.Inspired by the clustering ensemble technique,the proposed algorithm aims to combine multiple weak clusterers into a better clusterer in the initialization stage.Specifically,multiple base clustering results are first generated by performing the k-means algorithms repeatedly.And some clustering ensemble algorithms will be conducted to fuse the multiple base clusterings into a pre-clustering result.Then,the pre-clustering is used to initialize the cluster centers which will be further used in the k-means clustering to obtain the final clustering result.The experiments are conducted to compare the final clustering result with the pre-clustering result,and also to compare the proposed initialization method with several other initialization methods for k-means.The experimental results have shown the effectiveness and robustness of the proposed algorithm.Based on the proposed ensemble-initialized k-means clustering algorithm,we further conduct research on its facilitation,in order to address the efficiency bottleneck of ensemble generation and consensus function in the clustering ensemble process.This thesis presents a down-sampling based facilitation method for the proposed ensemble-initialized k-means.Specifically,the dataset is first randomly down-sampled to obtain a set of sampling data points.Then the ensemble-based initialization method is performed on the sampling points to efficiently obtain the initialized cluster centers,which will be exploited by the final kmeans clustering stage.The experiments have been conducted to compare the time costs and clustering performance under different sampling rates.The experimental results have shown that,as the sampling rate decreases,the time cost of the proposed algorithm declines while its clustering performance remains at a reasonable level,which demonstrates the effectiveness and efficiency of the proposed facilitation method.
Keywords/Search Tags:data mining, clustering algorithm, ensemble clustering, k-means clustering, down sampling
PDF Full Text Request
Related items