K-modes Cluster Analysis An Application Based On Attribute Value Weight

Posted on:2022-05-04

Degree:Master

Type:Thesis

Country:China

Candidate:R L Hao

Full Text:PDF

GTID:2518306521994959

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Cluster analysis is one of the important research contents in data mining.Its main task is to divide data objects into multiple clusters according to a certain similarity criterion.Objects in the same cluster have high similarity,but objects in the different clusters have greater dissimilarity.K-modes cluster analysis,as a kind of classification data cluster analysis method,has the problem that the distance measurement is not accurate enough,and the selection of the initial center point lacks stability,which seriously affects the effect of cluster analysis.This paper uses attribute value weights to conduct in-depth research on distance measurement,initial center point selection and application in k-modes cluster analysis.The main research results are as follows:(1)A k-modes clustering analysis algorithm based on attribute value weight is given.When calculating the distance between data objects,this algorithm combines the distribution characteristics of attribute values in the data set and the difference of attribute values themselves,redefines the dissimilarity measurement formula of categorical data,and effectively solves the difference measurement between attribute values;Using the frequency of attribute values and the weight of each attribute value,a way to update cluster centers is given,and a k-modes clustering analysis algorithm based on the attribute value weights is given;on the UCI data set,experiments have verified that the method can effectively improve the effect of cluster analysis.(2)A k-modes initial clustering center point selection strategy based on distance and outlier is given.This strategy uses attribute value weights to define the outlier degree measurement formula of the data object,and combines the distance between the data object and the selected center point,and selects the data object with relatively far distance and low outlier degree as the initial center point.Using UCI data set,experiments verify the effectiveness of this strategy.(3)Based on the above research content,using python language,designes and implements a prototype system for cluster analysis of celestial data based on k-modes,and gives a more detailed description of its corresponding function diagrams and software system structure.The operating results show that the prototype system can provide an effective way for the knowledge discovery of celestial body spectra.

Keywords/Search Tags:

K-modes, Attribute value weight, Dissimilarity measure, Outlier

PDF Full Text Request

Related items

1	An K-modes Clustering Algorithm Based On Dynamic Weight
2	Research On K-modes Clustering Algorithm Of Dissimilarity Measure
3	Application Of Outlier Detection In The Abnormal Analysis Of Medical Prescription
4	Study On Outlier Detection Based On K-nearest Neighborhood MST
5	Based On Information Entropy And The Subspace Outlier Mining Algorithm
6	Outlier Mining Of Book Selling Information Based On Rough Set
7	Research On Outlier Detection Methods Based On Neighborhood Rough Measure
8	Clustering Algorithm Of Missing Data Based On Dissimilarity Measure
9	Dissimilarity Measure Based On The Free Parameters Of Data Mining Research
10	Outlier Mining Method Based On Gini Indexes And Sub-space Research