Font Size: a A A

Attribute Relevancy-based Subspace Clustering Algorithm

Posted on:2015-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:H L KangFull Text:PDF
GTID:2298330467984624Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As one of the top ten challenges of data mining, clustering analysis is the focus of research area in data mining. In clustering analysis, clustering high-dimensional data is a hot topic that the researchers desire to explore. Recent researches suffer from the curse of dimensional phenomenon. The dimension curse means that with the increase in the dimension cardinality, the distance between specified point to the nearest point and to its furthest point is hard to tell. Discovering meaningful and separate clusters is full of challenges. However, information in high-dimensional data can be converted into valuable knowledge using subspace clustering technology which can solve curse of dimension.Subspace clustering differs with the clustering algorithm which detects clusters embedded in all feature spaces; the goal of subspace clustering is to find clusters embedded in different subspace. Through the research of subspace clustering algorithm, it is found that subspace clustering is required to repeatedly scan the database and requires the user to provide the parameters, this leads to the limited efficiency and accuracy of the algorithm. Through studying of the frequent pattern, it is found that the determination of subspace can be converted into frequent pattern mining problem, through scanning the database twice, all information is stored in the frequent pattern tree, and then frequent patterns can be found.This paper presents an innovative subspace clustering algorithm, the algorithm takes the thought of grid clustering, it models the clustering problem to frequent itemsets mining problem. ARSUB builds a relevancy matrix based on the strong correlated item pairs to evaluate the relevance, and then a strong correlated candidate subspace is generated. Finally, clusters existed in different clusters by clustering the strong correlated candidate subspace is obtained. The algorithm uses frequent pattern tree structure to store the information of the entire data set and it can mine subspace efficiently.Experiments are carried out separately on synthetic data sets and real data sets. The results show that ARSUB has higher accuracy compared to other subspace clustering algorithms, and experiment results indicate the effectiveness and feasibility of ARSUB. Meanwhile, the time cost of the algorithm is compared with other algorithms and ARSUB has higher efficiency.
Keywords/Search Tags:Data mining, Subspace clustering, FP-Tree, Relevancy
PDF Full Text Request
Related items