Font Size: a A A

Research On Mixed Data Clustering Algorithm Based On Information Entropy To Define Attribute Weights

Posted on:2022-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:W X CuiFull Text:PDF
GTID:2507306509969749Subject:Statistics
Abstract/Summary:PDF Full Text Request
Clustering attempts to divide the samples that are similar to each other into the same subset,while the samples that differ greatly from each other into different subsets.Clustering algorithms have been widely studied for many years,and have been widely used in engineering,computer science,biomedical science,earth science,social science,economics and other fields.With the development of science and technology and the progress of society,we have stepped into the era of big data.A lot of information is hidden in all-encompassing data,and mining the potential information of the data has a high value.Through cluster analysis,the data objects are divided into clusters with practical significance,which is helpful for human analysis and problem solving.Mixed data is the combination of numerical data and categorical data,while most of the real data sets are mixed data,so the problem of mixed data clustering has been paid more and more attention.In the process of clustering analysis,there are two major challenges and difficulties,namely the determination of the number of clustering categories and the selection of the clustering centers.In order to solve these two problems,new clustering algorithms are proposed respectively under the condition of mixed data.For the first problem,in order to obtain a better clustering result,the attribute weights are redefined by comprehensively considering the influence of within-cluster entropy and between-cluster entropy on weights,and a weighted clustering algorithm based on information entropy is proposed in combination with the redefined weights.The main works include: Within-cluster entropy and between-cluster entropy are introduced.the influence of within-cluster entropy and between-cluster entropy on clustering results is considered comprehensively,and assign new weights to attributes.Based on the new weight,the validity index,the generalized mechanism of finding the worst cluster and the dissimilarity measure are redefined.For the second problem,in order to obtain better clustering results of mixed data sets,the influence of within-cluster entropy and between-cluster entropy on weights were considered,and the global distribution information of samples was considered,and a global weighted k-prototypes clustering algorithm based on information entropy was proposed.The main works include: First,the algorithm randomly selects enough initial prototypes to consider the global distribution of the data set.Then,the iterative optimization method is used to eliminate the redundant prototypes gradually by using the elimination evaluation function.The proposed algorithm has been experimentally verified its effectiveness.Finally,a summary of the whole paper is given,and the future research direction is prospected briefly.
Keywords/Search Tags:K-prototypes algorithm, Mixed data, Information entropy, Attribute weights, Global clustering, Clustering center selection, Iterative optimization
PDF Full Text Request
Related items