Font Size: a A A

Categorical Relation Graph Construction And Clustering Analysis For Categorical Data

Posted on:2022-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZhangFull Text:PDF
GTID:2518306509470194Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is an essential branch of machine learning,which is unsupervised and data-driven.At present,great achievements have been made in the study of clustering analysis of numerical data.In the real world,there is also a lot of symbolic data.Due to the lack of inherent geometric characteristics of symbolic data,the distance cannot be constructed as a similarity measure through numerical differences between data points,so some classical clustering algorithms cannot meet the requirements of this type of data.In recent years,symbolic data clustering has become an important research field,and has been widely applied in retail,knowledge mapping,bioinformatics and other fields.This paper systematically studies the problems of symbolic data clustering in big data environment,such as slow running speed,high computational cost and the existence of two feature spaces in clustering integration.The main research contents are as follows:(1)This paper proposes a fast symbolic data clustering algorithm based on symbolic relation graph.In this algorithm,we replace the original data with the symbolic relation graph by establishing the relation graph between symbols to reduce the scale of the data set.Then the symbolic graph is segmented by the representative graph segmentation algorithm.Finally,according to the different symbols of a sample to find the highest probability as the classification of the class.The algorithm is compared with other algorithms in a large number of data sets,and the effectiveness of the proposed algorithm is proved.(2)The purpose of cluster integration is to combining multiple base partitions into a robust,stable,and accurate partition.Because most of the existing clustering integration algorithms only consider the feature space of the base clustering and ignore the original feature space of the data set when measuring the similarity between the base clusterings,the accuracy of clustering integration results is not high.In this paper,a fast clustering integration algorithm based on symbolic diagram is proposed.The algorithm solves this problem effectively by constructing the base clustering diagram,corresponding the center of each class divided by the base clustering to the original data set,and then calculating the Euclidean distance between each other as the similarity weight of the base clustering diagram.A large number of experiments show that the new algorithm is more effective than other algorithms.(3)A symbolic data cluster analysis system is designed and developed,which includes data generation or import,algorithm selection and result display.The system integrates the existing symbolic data clustering algorithm and a fast symbolic data clustering algorithm based on the symbol relationship graph,and tests different data sets,which has good applicability.The research results of this paper further enrich the research of symbolic data clustering and present efficacious technical support for symbolic data.
Keywords/Search Tags:Cluster analysis, Symbolic data clustering, Similarity measurement, Relation graph, Ensemble clustering
PDF Full Text Request
Related items