Font Size: a A A

Research On Subspace Clustering Algorithm On High-dimensional Categorical Datasets

Posted on:2009-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2178360272970833Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Clustering is an important task of data mining. Clustering is an important task of date mining, which also is a difficult question in the field of data mining, especially dealing with large data set of high dimensionality. Because of the curse of dimensionality, it is common for all of the objects in a dataset to be nearly equidistant from each other, completely masking the clusters. So it is hard to differentiate data points based on distance similarity, then traditional clustering methods can't perform well. Presently, subspace clustering is an efficient method to deal with large data set of high dimensionality.In the field of studying high dimensional data set, investigators are faced with a vast challenge to deal with categorical datasets. Traditional subspace clustering algorithms mainly aim at lower-dimensional continuous datasets, whereas they are difficult to deal with categorical datasets. After analyzing the subspace clustering algorithms in common use, which need to scan databases several times in order to confirm the subspaces of clusters, so time efficiency is very low. We find the similarity of confirming the subspaces and mining the frequently patterns, and utilize Frequent Pattern-Growth finding all information just to scan the database twice, thereby find all frequency patterns.A new subspace clustering algorithm-FPSUB is proposed. It stores the information of datasets with a Frequent Pattern-Tree framework, which transforms clustering clusters into finding the frequent patterns, and then utilizes them to cluster the objects. FPSUB also can deal with the clustering result according to the user's demands without original numbers of the clusters.We experiment FPSUB algorithm on the real datasets, which proves the validity and feasibility of FPSUB on dealing with the high dimensional categorical datasets. FPSUB algorithm is compared with other algorithms on the real datasets. The experiment results demonstrate the feasibility and robustness of this algorithm.
Keywords/Search Tags:Clustering Analysis, Categorical Attribute, Subspace Clustering, Frequent Pattern, Frequent Pattern Tree
PDF Full Text Request
Related items