Attribute Relevancy-based Subspace Clustering Algorithm

Posted on:2015-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:H L Kang

Full Text:PDF

GTID:2298330467984624

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As one of the top ten challenges of data mining, clustering analysis is the focus of research area in data mining. In clustering analysis, clustering high-dimensional data is a hot topic that the researchers desire to explore. Recent researches suffer from the curse of dimensional phenomenon. The dimension curse means that with the increase in the dimension cardinality, the distance between specified point to the nearest point and to its furthest point is hard to tell. Discovering meaningful and separate clusters is full of challenges. However, information in high-dimensional data can be converted into valuable knowledge using subspace clustering technology which can solve curse of dimension.Subspace clustering differs with the clustering algorithm which detects clusters embedded in all feature spaces; the goal of subspace clustering is to find clusters embedded in different subspace. Through the research of subspace clustering algorithm, it is found that subspace clustering is required to repeatedly scan the database and requires the user to provide the parameters, this leads to the limited efficiency and accuracy of the algorithm. Through studying of the frequent pattern, it is found that the determination of subspace can be converted into frequent pattern mining problem, through scanning the database twice, all information is stored in the frequent pattern tree, and then frequent patterns can be found.This paper presents an innovative subspace clustering algorithm, the algorithm takes the thought of grid clustering, it models the clustering problem to frequent itemsets mining problem. ARSUB builds a relevancy matrix based on the strong correlated item pairs to evaluate the relevance, and then a strong correlated candidate subspace is generated. Finally, clusters existed in different clusters by clustering the strong correlated candidate subspace is obtained. The algorithm uses frequent pattern tree structure to store the information of the entire data set and it can mine subspace efficiently.Experiments are carried out separately on synthetic data sets and real data sets. The results show that ARSUB has higher accuracy compared to other subspace clustering algorithms, and experiment results indicate the effectiveness and feasibility of ARSUB. Meanwhile, the time cost of the algorithm is compared with other algorithms and ARSUB has higher efficiency.

Keywords/Search Tags:

Data mining, Subspace clustering, FP-Tree, Relevancy

PDF Full Text Request

Related items

1	The Research On Subspace Clustering For High Dimensional Data
2	Research On Improved Subspace Clustering Algorithm
3	Research On Algorithms For Subspace Clustering And Outlier Mining Based-on Information-entropy
4	The Research And Application Of Subspace Clustering Algorithms
5	Research On Data Stream Clustering Algorithm Based On Sliding Windows And Subspace Partition
6	Research On Data Mining Algorithm Based On Low-rank Sparse Subspace
7	Research On Density-based Subspace Clustering Algorithm For Data Streams
8	Research On Density-Based Subspace Clustering Algorithm For Data Streams
9	Research On Algorithms Of Subspace Clustering Based On Pattern Similarity
10	Research On Web Log And Subspace Clustering Mining Algorithms