Font Size: a A A

Research And Application Of K-Prototypes Algorithm Based On Density And Distance Adaptive Determination Of The Initial Centroids

Posted on:2020-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ChenFull Text:PDF
GTID:2428330575477683Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Over the past decades,with the progress of science and technology,the rapid development of the Internet has brought about a blowout of data growth.In order to carry these data,we have experienced the evolution from single computer to distributed in engineering level.In the information society,any data is precious,so any algorithm that can analyze and process big data has certain scientific research value.Clustering algorithm is a very common unsupervised learning algorithm in the process of data processing.In recent years,it has also received a wide range of attention in the field of algorithms.But clustering algorithms can be divided into many types.Different clustering algorithms can deal with different types of data,and different clustering algorithms can deal with different data models.For example,the K-Prototypes algorithm studied in this paper is an algorithm dealing with mixed data clustering and spherical data model clustering.The K-Prototypes algorithm evolved from the K-Means algorithm,so it has similar defects to the K-Means algorithm: the first is that the cluster number k needs to be artificially set(as the parameters of the algorithm),and in most cases,we do not know that the data sets should be clustered into several categories;secondly,the selection of the initial cluster center of this kind of algorithm is random,and this will lead to problems such as low clustering accuracy and unstable clustering results.In order to improve the above problems,this paper proposes a strategy based on density and distance adaptive determination of the initial centroids to improve the algorithm.By analyzing the results of general clustering,it can be found that the centroids are all dense points,and the centroids between different clusters are far away.Therefore,the initial centroids can be determined by finding the point set satisfying this feature,in order to achieve optimization results in improving clustering accuracy,stability,and speeding up convergence.Experiments on the UCI dataset show that the proposed algorithm outperforms the traditional K-Prototypes algorithm and the Fuzzy K-Prototypes algorithm in terms of clustering quality and stability.In line with the principle that algorithm research can not be divorced from reality,this paper will introduce a complete "cluster analysis" case and apply the improved K-Prototypes algorithm to analyze the relationship between students' performance in university and the characteristics of different students' evaluation about teachers(such as some students' habits always give high score,some students are more demanding).Through clustering algorithm,we can more clearly recognize the similarities and differences between samples.As far as possible not to waste any information that data can tell us,which is also the important significance of many data analysis algorithms.
Keywords/Search Tags:K-Prototypes algorithm, clustering, density and distance, adaptive determination
PDF Full Text Request
Related items