| Clustering analysis is a kind of important unsupervised machine learning method and the data mining technology.In recent years,researchers have proposed a series of cluster analysis algorithms for different application fields,which have been widely used in image analysis,text mining,social network analysis and other fields.However,existing researches on clustering analysis algorithms mainly focus on how to improve the accuracy of clustering results,and to some extent ignore the interpretability of algorithms and results.Different from traditional clustering analysis methods,interpretable clustering not only needs to obtain high quality clustering results,but also gives the reasoning reasons of clustering model and the significance of clustering results.In this paper,we have carried out related research on interpretable clustering for categorical data.Specifically,this article main research content is as follows:(1)Aiming at the problem that the existing categorical data clustering algorithms have poor interpretability,this paper proposes an interpretable clustering algorithm for categorical data based on Laplacian score.Firstly,the fractional values of each feature are calculated based on the Laplacian scores,and some features with low fractional values are selected to form the feature subset.Secondly,the data set is divided according to the feature subset,and the optimal partitioning results are selected.Finally,the threshold tree is constructed and the partition results are explained according to the formation rules of the tree structure.The effectiveness and interpretability of the proposed algorithm are verified by comparative experiments with related algorithms on real data sets.(2)In order to select the best segmentation point more automatically,an interpretable clustering algorithm for categorical data based on information entropy is proposed.Firstly,the information entropy of each attribute in the data set is calculated,then the segmentation attribute is selected,and the optimal segmentation point is determined by using Silhouette Coefficient.Secondly,the data set is segmented based on the optimal segmentation point.Finally,the lowest evaluated cluster is iteratively merged with the nearest cluster until the specified number of clusters is met.The results show that the proposed algorithm not only improves the validity of the clustering results,but also improves the interpretability of the clustering results.(3)The explainable clustering analysis system is designed and implemented by using the explainable algorithm based on instances and the explainable clustering algorithm based on visualization.This article research results enrich the research contents of interpretability clustering analysis,improve the application value of clustering analysis,such as machine learning,data mining is widely used in the field of provide important technical support. |