Font Size: a A A

Research On Internal Validation And Algorithm For Categorical Data Clustering

Posted on:2020-02-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:M H YangFull Text:PDF
GTID:1368330572954850Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Focused on the clustering analysis for categorical data,this paper starts with the study on internal clustering validation indexes(CVIs).The specific research results are as follows.(1)On the generality and applicability of existing internal CVIs,the paper studied several well-known internal CVIs and pointed out the intra-cluster compactness and inter-cluster measures or assumptions,proved that the compactness measures along are not capable of validating the partitions of different sizes,and analized the impact on the evaluation abilities of the separation measures or assumptions.(2)On the evaluation capabilities of internal CVIs,the paper presented a measure of the similarity of the CVIs' evaluation capabilities based on D-S evidence theory,and then presented a method to evaluate the evaluation capabilities of internal CVIs using external CVIs as benchmarks.By such method,the evaluation capabilities of exsiting internal CVIs are analized.(3)Aiming to enhance the evaluation capability of the internal CVI,the paper presented an althernative internal CVI,namly,CUBAGE,which uses both compactness and separation measures,by first presenting a separation measure based on information gain.Theoretical and empirical analysis showed that CUBAGE outperforms the existing internal CVIs.(4)Aiming to enhance the quality of results of the prototype-based partitioning clustering algorithms,the paper presented a no-prototype iteration method,and then presented a partitioning clustering algorithm based on internal validation,namely,k-CUBAGE.Theoretical and empirical analysis showed that the algorithm has good constringency rate,and can produce clustering results of higher quality more stablely.(5)Aiming to enhance the stability of k-CUBAGE algorithm,the paper further presented a method based on the crowding level to determine the initial partition.The k-CUBAGE process adopting such method,i.e.,k-CUBAGE+,can produce unique clustering result,therefore,the randomness of k-CUBAGE can be eliminated,and such result has better quality than the expectation of k-CUBAGE clustering.
Keywords/Search Tags:Data Mining, Clustering Analysis, Clustering Validation, Categorical Data
PDF Full Text Request
Related items