Research On Subspace Clustering Algorithm On High-dimensional Categorical Datasets

Posted on:2009-12-01

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Wang

Full Text:PDF

GTID:2178360272970833

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Clustering is an important task of data mining. Clustering is an important task of date mining, which also is a difficult question in the field of data mining, especially dealing with large data set of high dimensionality. Because of the curse of dimensionality, it is common for all of the objects in a dataset to be nearly equidistant from each other, completely masking the clusters. So it is hard to differentiate data points based on distance similarity, then traditional clustering methods can't perform well. Presently, subspace clustering is an efficient method to deal with large data set of high dimensionality.In the field of studying high dimensional data set, investigators are faced with a vast challenge to deal with categorical datasets. Traditional subspace clustering algorithms mainly aim at lower-dimensional continuous datasets, whereas they are difficult to deal with categorical datasets. After analyzing the subspace clustering algorithms in common use, which need to scan databases several times in order to confirm the subspaces of clusters, so time efficiency is very low. We find the similarity of confirming the subspaces and mining the frequently patterns, and utilize Frequent Pattern-Growth finding all information just to scan the database twice, thereby find all frequency patterns.A new subspace clustering algorithm-FPSUB is proposed. It stores the information of datasets with a Frequent Pattern-Tree framework, which transforms clustering clusters into finding the frequent patterns, and then utilizes them to cluster the objects. FPSUB also can deal with the clustering result according to the user's demands without original numbers of the clusters.We experiment FPSUB algorithm on the real datasets, which proves the validity and feasibility of FPSUB on dealing with the high dimensional categorical datasets. FPSUB algorithm is compared with other algorithms on the real datasets. The experiment results demonstrate the feasibility and robustness of this algorithm.

Keywords/Search Tags:

Clustering Analysis, Categorical Attribute, Subspace Clustering, Frequent Pattern, Frequent Pattern Tree

PDF Full Text Request

Related items

1	Algorithm For Mining Association Rules Based On Clustering
2	Study And Application Of Frequent Pattern And Multi-modalities Data Clustering Algorithm
3	The Research On The Related Problems Of Association Rule Mining
4	A Study On Algorithms Of Weighted Frequent Pattern Mining
5	Research On Mining Algorithms Of Maximal Frequent Item Sets
6	The Research Of Association Rules Algorithm Based On Frequent Pattern Tree
7	The Analysis, Based On Data Mining Algorithms For Frequent Pattern Tree
8	Study And Design On The Algorithms Of Mining Association Rules
9	The Study Of Maximum Frequent Itemsets Algorithm Based On Frequent Pattern Tree
10	The Research And Implementation Of An XML Document Structural Clustering Algorithm Using Frequent Path Pattern