Research On Clustering Algorithm For Mixed-type Data Based On K-modes Algorithm

Posted on:2020-09-06

Degree:Master

Type:Thesis

Country:China

Candidate:F Yuan

Full Text:PDF

GTID:2428330602452474

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

In the information age,various kinds of data produced in people's daily life are growing explosively.These massive data contain a lot of information.Potential value information can be found from the massive data.Cluster analysis,as an unsupervised learning method in the field of machine learning,is widely used in data mining technology.It aims at physical or abstract objects according to the data.Some similarity rules are divided into several clusters,and the clustering results satisfy the basic condition that the similarity between objects in clusters is large and the similarity between objects in clusters is small.This condition makes measuring the similarity between objects become one of the core problems of the algorithm.Clustering analysis can be applied to different data sets,each data set contains different types of attributes.Data attributes can be divided into three categories: numerical attribute,nominal attribute and ordinal attribute.Some data sets have single data attribute,while some data sets have two or three kinds of numerical attribute,nominal attribute and ordinal attribute,which are called mixed-type data sets.For mixed-type data,it is the key point and difficulty to determine the similarity measurement method reasonably.Some existing distance measurement methods for mixed-type data mainly focus on mixed numerical attribute and nominal attribute.For the data of mixed ordinal attributes and nominal attributes,there are few related studies.This paper focuses on the clustering algorithm of mixed ordinal and nonimal attributes data.In order to construct the distance measurement formula of ordinal attribute,this paper first determines the distance measurement formula of nomianl attributes,which is the prerequisite for the establishment of the distance measurement formula of ordinal attribute.The essential difference between ordinal attributes and nonimal attributes is reflected in the comparative relationship between ordinal attribute values.This relationship can be characterized by the distance value betweentwo adjacentattributes.Based on the distance value between the attributes of nominal attributes,the reasonable range of the distance value between the attribute values of ordinal attribute can be determined.Secondly,the ordinal difference function describing the order difference between two attribute values is given for the ordinal attribute.Thirdly,the distance formula of ordinal attribute is constructed according to the range of distance values and the ordinal difference function.Finally,when calculating the distance between sample points and centroid,the proportion of attribute values in cluster is introduced.After applying the new distance metric formula,the original clustering algorithm is extended to the data set of mixed ordinal and nominal attributes.The experimental simulation on the data set of multiple mixed attributes and the evaluation with ACC evaluation index are carried out.The results show that the proposed distance metric formula is effective.And the improved algorithm shows good performance.

Keywords/Search Tags:

Mixed-type Data, Ordinal Attribute, Categorical Attribute, Clutering Algorithm, Rough set

PDF Full Text Request

Related items

1	Based On The Ordinal Attribute Association Rule Mining Algorithm Research And Implementation
2	Research On Algorithm For Attribute Reduct & Core Computation Based On Rough Set Theory
3	The Study Of Attribute Reduction Algorithm With Cost-sensitive In Rough Set Theory
4	Study On Attribute Redution Based On Rough Sets And Its Application
5	The Research Of Attribute Reduction With Algorithm Based On S-rough Set Theory
6	Research Of Attribute Reduct Algorithm Based On Rough Set
7	Research On Quick Algorithm For Attribute Reduct Based On Rough Set Theory
8	Attribute Reduction Algorithm Oriented Rough Set
9	Research On Neighborhood Rough Set Model And Reduction Algorithm For Multi-type Attribute
10	Research On Accelerated Algorithm Of Attribute Reduction In Rough Sets And Its Neighborhood Model