Font Size: a A A

Research On Hierarchical Clustering Methods For Categorical Set-valued Data

Posted on:2016-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:F F GuoFull Text:PDF
GTID:2308330482451150Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Clustering as an unsupervised learning method, is a major field in data mining, can explore and extract the inner relationship between different things effectively. At present, clustering has achieved a mass of theories and methods, and clustering analysis now has been widely used in the fields of data mining, pattern recognition, image processing, market analysis, etc.Most clustering algorithms use a feature vector to depict an object However, in real applications, an object can be described by multiple values on some attributes. For example, one user can have different evaluations on different movies, in the same area, the attribute of the weather has multiple values at different times. Facing up with the new data structure, it is necessary to come up with a new clustering algorithm.This thesis defined an object which has multiple values on some attributes as a set-valued object, and focused on the research and analysis of hierarchical clustering algorithm for categorical set-valued data.The main work of this thesis is as follows:(1) We gives a distance definition between two set-valued objects, and a new hierarchical clustering algorithm for categorical set-valued data named SV_clustering is presented. Compared with traditional k-means and k-modes, the experimental results show that, SV_clustering has better clustering effect and higher clustering accuracy.(2) Based on the clustering results of set-valued data, we puts forward a new time series analysis algorithm, the distance of two clusters is defined, and the distance measure trends to do detailed element composition analysis. A new algorithm is presented to get representative attribute value of objects, finally this thesis realizes an algorithm to graph the path of clustering evolution.(3) We designs and realizes a clustering system based on the MATLAB GUI, the system consists of three main functions:data pre-processing, hierarchical clustering algorithm for categorical set-valued data set, time evolution analysis algorithm. This experiment system provides a friendly interaction interface for users.The above work expands the application fields of clustering algorithm and gives a valuable reference for clustering methods of set-valued data.
Keywords/Search Tags:Set-valued data set, Hierarchical clustering analysis, Time evolution analysis
PDF Full Text Request
Related items