Font Size: a A A

Research On Clustering Algorithm For Mixed-type Data Based On K-modes Algorithm

Posted on:2020-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:F YuanFull Text:PDF
GTID:2428330602452474Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In the information age,various kinds of data produced in people's daily life are growing explosively.These massive data contain a lot of information.Potential value information can be found from the massive data.Cluster analysis,as an unsupervised learning method in the field of machine learning,is widely used in data mining technology.It aims at physical or abstract objects according to the data.Some similarity rules are divided into several clusters,and the clustering results satisfy the basic condition that the similarity between objects in clusters is large and the similarity between objects in clusters is small.This condition makes measuring the similarity between objects become one of the core problems of the algorithm.Clustering analysis can be applied to different data sets,each data set contains different types of attributes.Data attributes can be divided into three categories: numerical attribute,nominal attribute and ordinal attribute.Some data sets have single data attribute,while some data sets have two or three kinds of numerical attribute,nominal attribute and ordinal attribute,which are called mixed-type data sets.For mixed-type data,it is the key point and difficulty to determine the similarity measurement method reasonably.Some existing distance measurement methods for mixed-type data mainly focus on mixed numerical attribute and nominal attribute.For the data of mixed ordinal attributes and nominal attributes,there are few related studies.This paper focuses on the clustering algorithm of mixed ordinal and nonimal attributes data.In order to construct the distance measurement formula of ordinal attribute,this paper first determines the distance measurement formula of nomianl attributes,which is the prerequisite for the establishment of the distance measurement formula of ordinal attribute.The essential difference between ordinal attributes and nonimal attributes is reflected in the comparative relationship between ordinal attribute values.This relationship can be characterized by the distance value betweentwo adjacentattributes.Based on the distance value between the attributes of nominal attributes,the reasonable range of the distance value between the attribute values of ordinal attribute can be determined.Secondly,the ordinal difference function describing the order difference between two attribute values is given for the ordinal attribute.Thirdly,the distance formula of ordinal attribute is constructed according to the range of distance values and the ordinal difference function.Finally,when calculating the distance between sample points and centroid,the proportion of attribute values in cluster is introduced.After applying the new distance metric formula,the original clustering algorithm is extended to the data set of mixed ordinal and nominal attributes.The experimental simulation on the data set of multiple mixed attributes and the evaluation with ACC evaluation index are carried out.The results show that the proposed distance metric formula is effective.And the improved algorithm shows good performance.
Keywords/Search Tags:Mixed-type Data, Ordinal Attribute, Categorical Attribute, Clutering Algorithm, Rough set
PDF Full Text Request
Related items