Research On Mixed Data Clustering Algorithm Based On Information Entropy To Define Attribute Weights

Posted on:2022-04-04

Degree:Master

Type:Thesis

Country:China

Candidate:W X Cui

Full Text:PDF

GTID:2507306509969749

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Clustering attempts to divide the samples that are similar to each other into the same subset,while the samples that differ greatly from each other into different subsets.Clustering algorithms have been widely studied for many years,and have been widely used in engineering,computer science,biomedical science,earth science,social science,economics and other fields.With the development of science and technology and the progress of society,we have stepped into the era of big data.A lot of information is hidden in all-encompassing data,and mining the potential information of the data has a high value.Through cluster analysis,the data objects are divided into clusters with practical significance,which is helpful for human analysis and problem solving.Mixed data is the combination of numerical data and categorical data,while most of the real data sets are mixed data,so the problem of mixed data clustering has been paid more and more attention.In the process of clustering analysis,there are two major challenges and difficulties,namely the determination of the number of clustering categories and the selection of the clustering centers.In order to solve these two problems,new clustering algorithms are proposed respectively under the condition of mixed data.For the first problem,in order to obtain a better clustering result,the attribute weights are redefined by comprehensively considering the influence of within-cluster entropy and between-cluster entropy on weights,and a weighted clustering algorithm based on information entropy is proposed in combination with the redefined weights.The main works include: Within-cluster entropy and between-cluster entropy are introduced.the influence of within-cluster entropy and between-cluster entropy on clustering results is considered comprehensively,and assign new weights to attributes.Based on the new weight,the validity index,the generalized mechanism of finding the worst cluster and the dissimilarity measure are redefined.For the second problem,in order to obtain better clustering results of mixed data sets,the influence of within-cluster entropy and between-cluster entropy on weights were considered,and the global distribution information of samples was considered,and a global weighted k-prototypes clustering algorithm based on information entropy was proposed.The main works include: First,the algorithm randomly selects enough initial prototypes to consider the global distribution of the data set.Then,the iterative optimization method is used to eliminate the redundant prototypes gradually by using the elimination evaluation function.The proposed algorithm has been experimentally verified its effectiveness.Finally,a summary of the whole paper is given,and the future research direction is prospected briefly.

Keywords/Search Tags:

K-prototypes algorithm, Mixed data, Information entropy, Attribute weights, Global clustering, Clustering center selection, Iterative optimization

PDF Full Text Request

Related items

1	The Research On Clustering Of Mixed Data Stream Based On DPC Algorithm
2	Research On K-prototypes Clustering Algorithm And Data Dimension Reduction
3	Application Of K-modes Clustering Algorithm And Greedy Algorithm Based On Multi-attribute Weight In Dormitory Allocation
4	Improved Spectral Clustering Algorithm And Its Application In Risk-model
5	A Study On Hierarchical Clustering Of Micro-learning Units Based On Topic Feature Centers
6	Research On Spatiotemporal Behavior Of Campus Network Users Based On Clustering Algorithm
7	Research On Clustering Algorithms For Classification Matrix-object Data
8	Improved Chameloen Clustering Algorithm Based On K-medoids
9	Research On Sina Weibo User Information Based On Two Improved Clustering Algorithm
10	Research On Key Technologies Of Campus Network Behavior Analysis Robot Based On Clustering Algorithm