Font Size: a A A

Design And Implementation Of K-modes Type Algorithms Based On R For Categorical Data

Posted on:2018-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:H N LiFull Text:PDF
GTID:2348330521951744Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Clustering as an unsupervised learning method,is a major field in data mining,can explore and extract the inner relationship between different things effectively.At present,clustering has achieved a mass of theories and methods,however the input of traditional clustering algorithm is a matrix of single valued attributes.In many real applications,some features may take more than one value for an object.For example,one user can have different evaluations on different movies,in the same area,the attribute of the weather has multiple values at different times.This kind of data is called set-valued attributes data.In the real world,the categorical data is still widespread.As the categorical data lacks of the inherent geometric properties,distance function between data points cannot be defined naturally,the corresponding clustering model and its algorithm design differentiate from that of the numerical data.Facing up with the new data structure,it is necessary to come up with a new clustering algorithm.It is necessary to use the appropriate data mining tools for specific domain problems.As an open-source environment for data analysis,R provides conveniences for majority of users in the fields of data operation and data visualization,besides the huge software ecosystem of R has attracted more and more scientists' attention.Because of the flexibility and development of R language,more and more scientists have began to use R for cluster analysis.At present,there are many clustering algorithms using R for numerical data,which promotes the development of clustering algorithm.As a clustering algorithm based partition,due to the high efficiency,k-modes algorithm and its extension algorithm are widely used in many applications.However,to the best of our knowledge,there are no comprehensive open-source packages existing for this problem.In this paper,we designed the k-modes type clustering algorithm package for categorical data,the main work of this thesis is as follows:(1)We designed the k-modes type clustering algorithm package for categorical data written in R language,which is the first comprehensive open-source library for clustering categorical data.We hope it will facilitate the development of clustering categorical data and encourage researchers to share their algorithms.(2)We designed and realized the k-modes type clustering algorithm for categorical data,and the validity and practicability of R package are verified by experiment and visualization technology.Through the study of k-modes clustering algorithm package,it broadens the research scope of clustering algorithm,and provides a new direction for clustering algorithm of categorical data.
Keywords/Search Tags:Clustering Algorithm, Categorical Data, R Package
PDF Full Text Request
Related items