Font Size: a A A

Research On New Cluster Validity Index For Overlapping Datasets In Cluster Analysis

Posted on:2020-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2428330575454498Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Cluster analysis plays an important role in many scientific fields.Clustering algorithm and clustering verification are two basic elements of cluster analysis.Before clustering,the number of clusters is the basic parameter of the clustering algorithm.After clustering,the validity of the cluster is performed.Throughout the process,it is important to have a reasonable choice of the optimal number of clusters for the correct clustering results.As an effective method to measure clustering performance and determine the number of clusters,clustering effectiveness indicators are particularly important in the process of clustering analysis.The main work of this thesis is to study the cluster validity index while improving the clustering algorithm.Based on this,a new cluster validity index is proposed for some shortcomings.Both the new algorithm and the new indicator are more suitable for processing overlapping data sets,which improves the inadequacy of some previous indicators and algorithms that cannot handle overlapping data.The specific work content is as follows:(1)This thesis mainly studies and analyzes different types of clustering algorithms and proposes a new clustering algorithm.In this thesis,12 clustering algorithms are analyzed.Based on the research of 12 different classification clustering algorithms,it is concluded that different types of algorithms have their own advantages and disadvantages.The K-means algorithm is introduced in detail because the new algorithm is an improvement of the K-means algorithm by using the meshing method.The new algorithm not only overcomes the shortcomings of the K-means algorithm but also effectively processes overlapping data sets.(2)The article focuses on 13 clustering effectiveness indicators,which are divided into two categories for discussion and analysis.Through the introduction of analysis and a large number of relevant literature summaries,it can be seen that both external validity indicators and internal validity indicators have obvious deficiencies in data structure diversity and overlap.(3)This thesis mainly proposes a new cluster validity indicator for the overlapping data index-WCH index.The new cluster validity index is composed of three parts:cluster tightness,clustering resolution and clustering overlap.The introduction of the new indicator not only takes into account the problem of most indicators considering the tightness within the cluster and the degree of inter-cluster separation,but also adds the factors that influence the influence of data overlap on the clustering results.This thesis also uses mathematical methods to classify and summarize the data overlap.(4)This thesis has done a lot of comparative experiments on the detection of the performance of new indicators.From the many indicators,a more classic representative indicator DI index,the DBI index,the I index and the COP index were compared with the new cluster validity index.Based on the judgment of the clustering results of different types of data sets,a comparative test is conducted.These datasets in the comparative experiment included 5 simulated data sets and 3 real data sets.These are different data sets with different dimensions,different spatial distributions,different overlaps and different scales.These comparative experiments are used to demonstrate the superiority of the new indicators.
Keywords/Search Tags:Cluster analysis, Clustering algorithm, Overlap degree, Cluster validity index
PDF Full Text Request
Related items