Font Size: a A A

K-modes Cluster Analysis An Application Based On Attribute Value Weight

Posted on:2022-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:R L HaoFull Text:PDF
GTID:2518306521994959Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Cluster analysis is one of the important research contents in data mining.Its main task is to divide data objects into multiple clusters according to a certain similarity criterion.Objects in the same cluster have high similarity,but objects in the different clusters have greater dissimilarity.K-modes cluster analysis,as a kind of classification data cluster analysis method,has the problem that the distance measurement is not accurate enough,and the selection of the initial center point lacks stability,which seriously affects the effect of cluster analysis.This paper uses attribute value weights to conduct in-depth research on distance measurement,initial center point selection and application in k-modes cluster analysis.The main research results are as follows:(1)A k-modes clustering analysis algorithm based on attribute value weight is given.When calculating the distance between data objects,this algorithm combines the distribution characteristics of attribute values in the data set and the difference of attribute values themselves,redefines the dissimilarity measurement formula of categorical data,and effectively solves the difference measurement between attribute values;Using the frequency of attribute values and the weight of each attribute value,a way to update cluster centers is given,and a k-modes clustering analysis algorithm based on the attribute value weights is given;on the UCI data set,experiments have verified that the method can effectively improve the effect of cluster analysis.(2)A k-modes initial clustering center point selection strategy based on distance and outlier is given.This strategy uses attribute value weights to define the outlier degree measurement formula of the data object,and combines the distance between the data object and the selected center point,and selects the data object with relatively far distance and low outlier degree as the initial center point.Using UCI data set,experiments verify the effectiveness of this strategy.(3)Based on the above research content,using python language,designes and implements a prototype system for cluster analysis of celestial data based on k-modes,and gives a more detailed description of its corresponding function diagrams and software system structure.The operating results show that the prototype system can provide an effective way for the knowledge discovery of celestial body spectra.
Keywords/Search Tags:K-modes, Attribute value weight, Dissimilarity measure, Outlier
PDF Full Text Request
Related items