Font Size: a A A

Research And Application On Mixed Data Clustering Algorithm Based On Intra-Cluster And Inter-Cluster Information

Posted on:2022-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z J WuFull Text:PDF
GTID:2518306539981469Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Cluster center representation,attribute weight quantification,cluster number determination and lack of analysis tools are the main research hot issues in the current mixed data clustering research.In this paper,a comprehensive application of the strategy for intra-cluster categorical attributes noise-filtering,adaptive weight adjustment strategy combining intra-cluster and inter-cluster information,and the rationality strategy of the mixed data partition based on the Category Utility Function are used.The research proposed a Mixed Data Clustering Algorithm with Noise-filtered Distribution Centroid and Weight Adjustment(MCFCAW)and a Determining the Number of Clusters Algorithm Base on Category Utility Function for Mixed Data(DNCCUFM),and designed and implemented Mixed Data Clustering Analysis Tool Based on PyQt5(MDCA-Tool).The main research contents and achievements are as follows:1.In order to improve the availability of the representation of intra-cluster categorical attributes for mixed data and automatically quantify the weights of categorical attribute and numerical attribute,we propose MCFCAW.noise-filtered strategy for intra-cluster categorical attributes can accurately describe the value distribution of intra-cluster categorical attributes,and achieve the goal of filtering the "noise values" of intra-cluster categorical attributes;adaptive weight adjustment strategy combining intra-cluster and inter-cluster uses intra-cluster homogeneity and inter-clusters heterogeneity as the basis for judging,and gives important attributes relatively greater weight.The case analysis and experimental results together show that the MCFCAW algorithm is effective and feasible,the convergence speed is faster,and the mixed data clustering effect is better.2.To address the problem of determining the number of clusters for mixed data,we propose DNCCUFM.The category utility function based on information entropy and coefficient of variation can quantify the degree of dispersion of the attributes of the cluster and the attributes of the data set,and achieve the purpose of quantify the rationality of the mixed data clustering results.The DNCCUFM algorithm can determine the approximate optimal number of clusters in the data set based on the clustering results of the mixed data clustering algorithm on the data set under different cluster number settings.The case analysis and experimental results together show that the DNCCUFM algorithm is effective and feasible,and its cluster number determination is more accurate in numerical data sets.and more efficient in categorical data sets and mixed data sets.3.MDCA-Tool has been developed,which realize the function modules such as parameter setting,dataset management,algorithm execution,experiment record,etc.Reasonably divide the operations involved in the algorithm,and effectively manage input and output data.The research contributions of this paper arc as follows.The noise-filtered distribution centroid is proposed to solve the problem of inaccurate representation of the categorical attributes part of cluster center;Adaptive weight adjustment strategy combining intra-cluster and inter-cluster information iteratively and uniformly quantify categorical attribute and numerical attribute weights;Using coefficient of variation and information entropy to propose a category utility function to predict the number of clusters for mixed data;Develop a mixed data clustering analysis tool based on PyQt5.
Keywords/Search Tags:mixed data, clustering, noise-filtered distribution centroid, intra-cluster information, inter-cluster information
PDF Full Text Request
Related items