Font Size: a A A

Research On Algorithms Of Subspace Clustering Based On Pattern Similarity

Posted on:2007-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2178360182472136Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Clustering analysis, which is an important data mining problem, groups the data into classes or clusters so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. It has been widely used in numerous applications, including pattern recognition, data analysis, image processing, recommend system and electronic commerce.In this paper, general categorizations of clustering methods are discussed, and then the main clustering methods are analyzed in detail. The comparison of the main methods is made. Traditional clustering methods can work efficiently in low dimensional data. In high dimensional data, however, efficiency and effect of traditional clustering methods are not well because of data sparsity, distance similarity and more outlier in the data. Techniques for clustering high dimensional data include both feature transformation (dimension simplification) and subspace clustering (feature selection) techniques. We illustrate and make comparison several subspace clustering algorithm proposed in recent years in chapter 2.The clustering algorithm based on pattern similarity — pCluster is one of subspace clustering. Differ from the clustering models based on distance, pCluster model defines that two objects are similar if they exhibit a coherent pattern on a subset of dimensions. The purpose of pCluster algorithm is to reveal this kind of pattern similarity among objects. We discuss and implement the algorithm and propose an improvement algorithm for its shortcoming in chapter 3. MCAS (Maximum Coherent Attribute Sets) pruning by object block is used to prune invalid MCASs instead of symmetric MCAS pruning. At the same time, we enumerate attribute pair on every branch of prefix tree and calculate intersection of MCOS(Maximum Coherent Object Sets) on objects on the branch. The experiment proves that the efficiency, space overhead and effect of the improvement algorithm are better than original algorithm.At last, a prototype of recommend system based on improvement algorithm of pCluster is designed in this paper and the system validates the efficiency of the algorithm.
Keywords/Search Tags:Data Mining, Clustering Analysis, Subspace Clustering, Pattern similarity, Recommend system
PDF Full Text Request
Related items