Research And Application On Mixed Data Clustering Algorithm Based On Intra-Cluster And Inter-Cluster Information

Posted on:2022-06-23

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Wu

Full Text:PDF

GTID:2518306539981469

Subject:Software engineering

Abstract/Summary:

Cluster center representation,attribute weight quantification,cluster number determination and lack of analysis tools are the main research hot issues in the current mixed data clustering research.In this paper,a comprehensive application of the strategy for intra-cluster categorical attributes noise-filtering,adaptive weight adjustment strategy combining intra-cluster and inter-cluster information,and the rationality strategy of the mixed data partition based on the Category Utility Function are used.The research proposed a Mixed Data Clustering Algorithm with Noise-filtered Distribution Centroid and Weight Adjustment(MCFCAW)and a Determining the Number of Clusters Algorithm Base on Category Utility Function for Mixed Data(DNCCUFM),and designed and implemented Mixed Data Clustering Analysis Tool Based on PyQt5(MDCA-Tool).The main research contents and achievements are as follows:1.In order to improve the availability of the representation of intra-cluster categorical attributes for mixed data and automatically quantify the weights of categorical attribute and numerical attribute,we propose MCFCAW.noise-filtered strategy for intra-cluster categorical attributes can accurately describe the value distribution of intra-cluster categorical attributes,and achieve the goal of filtering the "noise values" of intra-cluster categorical attributes;adaptive weight adjustment strategy combining intra-cluster and inter-cluster uses intra-cluster homogeneity and inter-clusters heterogeneity as the basis for judging,and gives important attributes relatively greater weight.The case analysis and experimental results together show that the MCFCAW algorithm is effective and feasible,the convergence speed is faster,and the mixed data clustering effect is better.2.To address the problem of determining the number of clusters for mixed data,we propose DNCCUFM.The category utility function based on information entropy and coefficient of variation can quantify the degree of dispersion of the attributes of the cluster and the attributes of the data set,and achieve the purpose of quantify the rationality of the mixed data clustering results.The DNCCUFM algorithm can determine the approximate optimal number of clusters in the data set based on the clustering results of the mixed data clustering algorithm on the data set under different cluster number settings.The case analysis and experimental results together show that the DNCCUFM algorithm is effective and feasible,and its cluster number determination is more accurate in numerical data sets.and more efficient in categorical data sets and mixed data sets.3.MDCA-Tool has been developed,which realize the function modules such as parameter setting,dataset management,algorithm execution,experiment record,etc.Reasonably divide the operations involved in the algorithm,and effectively manage input and output data.The research contributions of this paper arc as follows.The noise-filtered distribution centroid is proposed to solve the problem of inaccurate representation of the categorical attributes part of cluster center;Adaptive weight adjustment strategy combining intra-cluster and inter-cluster information iteratively and uniformly quantify categorical attribute and numerical attribute weights;Using coefficient of variation and information entropy to propose a category utility function to predict the number of clusters for mixed data;Develop a mixed data clustering analysis tool based on PyQt5.

Keywords/Search Tags:

mixed data, clustering, noise-filtered distribution centroid, intra-cluster information, inter-cluster information

Related items

1	Research And Application Of Rough Clustering Methods Of Mixed Attribute Data With Self-adaptive Cluster Adjustment
2	Intra-cluster Key Distribution In Wireless Sensor Networks
3	Research On Clustering Algorithms For The Data With Multidimensional Mixed Attributes
4	Research On Cluster Header Information Transfer Method For Wireless Sensor Networks
5	Research On The Technology Of Interference Alignment Based On Clustering
6	The Design And Implementation Of Operation And Maintenance System For Traffic Data Cluster
7	Design And Implementation Of Clustering Ensemble Algorithm Based On Partition Selection And Weighting
8	Reasearch Of D2D-Oriented Capacity Optimization And Cooperative Security Technology In Wireless Access Network
9	A Study Of The Clustering Algorithm For Mixed Data
10	The Study And Improvement Of Fuzzy C-means Cluster Algorithm