Research On Internal Validation And Algorithm For Categorical Data Clustering

Posted on:2020-02-09

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M H Yang

Full Text:PDF

GTID:1368330572954850

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

Focused on the clustering analysis for categorical data,this paper starts with the study on internal clustering validation indexes(CVIs).The specific research results are as follows.(1)On the generality and applicability of existing internal CVIs,the paper studied several well-known internal CVIs and pointed out the intra-cluster compactness and inter-cluster measures or assumptions,proved that the compactness measures along are not capable of validating the partitions of different sizes,and analized the impact on the evaluation abilities of the separation measures or assumptions.(2)On the evaluation capabilities of internal CVIs,the paper presented a measure of the similarity of the CVIs' evaluation capabilities based on D-S evidence theory,and then presented a method to evaluate the evaluation capabilities of internal CVIs using external CVIs as benchmarks.By such method,the evaluation capabilities of exsiting internal CVIs are analized.(3)Aiming to enhance the evaluation capability of the internal CVI,the paper presented an althernative internal CVI,namly,CUBAGE,which uses both compactness and separation measures,by first presenting a separation measure based on information gain.Theoretical and empirical analysis showed that CUBAGE outperforms the existing internal CVIs.(4)Aiming to enhance the quality of results of the prototype-based partitioning clustering algorithms,the paper presented a no-prototype iteration method,and then presented a partitioning clustering algorithm based on internal validation,namely,k-CUBAGE.Theoretical and empirical analysis showed that the algorithm has good constringency rate,and can produce clustering results of higher quality more stablely.(5)Aiming to enhance the stability of k-CUBAGE algorithm,the paper further presented a method based on the crowding level to determine the initial partition.The k-CUBAGE process adopting such method,i.e.,k-CUBAGE+,can produce unique clustering result,therefore,the randomness of k-CUBAGE can be eliminated,and such result has better quality than the expectation of k-CUBAGE clustering.

Keywords/Search Tags:

Data Mining, Clustering Analysis, Clustering Validation, Categorical Data

PDF Full Text Request

Related items

1	The Research On Clustering Algorithm For Categorical Data Using Quantum Mechanics
2	The Study Of Clustering Data With Categorical Attributes In Data Mining
3	A Study On Clustering Algorithms For Categorical Data With Applications
4	Research On Categorical Data Clustering Algorithms
5	Automatic categorical data clustering and spatial data clustering by consecutive resolution refinement
6	Studies On Clustering Algorithms For Categorical Data
7	The Research Of Ant-Based Clustering Algorithm For Data Sets With Mixed Attribute
8	Studies On Clustering Algorithms For Categorical Data
9	Study Of Algorithms For Clustering Categorical Data
10	Research And Implementation Of Clustering Method For High Dimensional Categorical Data