Research On Algorithms Of Subspace Clustering Based On Pattern Similarity

Posted on:2007-12-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2178360182472136

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Clustering analysis, which is an important data mining problem, groups the data into classes or clusters so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. It has been widely used in numerous applications, including pattern recognition, data analysis, image processing, recommend system and electronic commerce.In this paper, general categorizations of clustering methods are discussed, and then the main clustering methods are analyzed in detail. The comparison of the main methods is made. Traditional clustering methods can work efficiently in low dimensional data. In high dimensional data, however, efficiency and effect of traditional clustering methods are not well because of data sparsity, distance similarity and more outlier in the data. Techniques for clustering high dimensional data include both feature transformation (dimension simplification) and subspace clustering (feature selection) techniques. We illustrate and make comparison several subspace clustering algorithm proposed in recent years in chapter 2.The clustering algorithm based on pattern similarity â€” pCluster is one of subspace clustering. Differ from the clustering models based on distance, pCluster model defines that two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The purpose of pCluster algorithm is to reveal this kind of pattern similarity among objects. We discuss and implement the algorithm and propose an improvement algorithm for its shortcoming in chapter 3. MCAS (Maximum Coherent Attribute Sets) pruning by object block is used to prune invalid MCASs instead of symmetric MCAS pruning. At the same time, we enumerate attribute pair on every branch of prefix tree and calculate intersection of MCOS(Maximum Coherent Object Sets) on objects on the branch. The experiment proves that the efficiency, space overhead and effect of the improvement algorithm are better than original algorithm.At last, a prototype of recommend system based on improvement algorithm of pCluster is designed in this paper and the system validates the efficiency of the algorithm.

Keywords/Search Tags:

Data Mining, Clustering Analysis, Subspace Clustering, Pattern similarity, Recommend system

PDF Full Text Request

Related items

1	The Research On Subspace Clustering For High Dimensional Data
2	Research On Web Log And Subspace Clustering Mining Algorithms
3	A Study Of The Pattern-Based Clustering Theories
4	Research On Improved Subspace Clustering Algorithm
5	Research On Clustering Algorithms For High-Dimensional Data
6	The Research And Application Of Subspace Clustering Algorithms
7	Research And Application Of Collaborative Filtering Algorithm Based On Clustering And Pattern Mining
8	Research On Subspace Clustering Algorithm On High-dimensional Categorical Datasets
9	Research On Algorithms For Subspace Clustering And Outlier Mining Based-on Information-entropy
10	Research Of Subspace Clustering Algorithm Based On Self-Representation