Research On Subspace Clustering Algorithms For High-dimensional Data

Posted on:2013-04-03

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhang

Full Text:PDF

GTID:2268330392470525

Subject:Information management and information systems

Abstract/Summary:

PDF Full Text Request

With the development of information technology and Internet, high dimensionaldata such as multi-media data and gene microarray data on the Internet is growing ex-ponentially and their attributes (dimensions) can amount to several hundreds. In suchcircumstances, high dimensional data clustering technique is one of the most importantmethods for analyzing high dimensional data.The characteristics of high dimensional data difer so much from those of the low di-mensional data. For instance, the similarity measurement which is commonly utilized inlow dimensional data clustering will not contribute to excellent clustering results any morein high dimensional space, and some attributes are correlated with each other to some ex-tent and the subspaces are possibly spanned by diferent combinations of attributes. Allthese particular features of high dimensional data make high dimensional data clusteringtechnique a quite challenging task. How to study high dimensional data clustering tech-niques based on the well-developed theory of data mining is critically important when toefectively instruct the new direction of Internet development.This thesis focuses on the research of high dimensional data clustering techniques.We firstly summarized the prevalent methods and current situations of high dimensionaldata analysis and categorized the existing high dimensional data clustering techniques,such as dimension reduction, manifold learning, distance metric learning, subspace clus-tering, etc. Then we focused our attention on the subspace clustering methods to furtherstudy high dimensional data clustering techniques. After we deeply studied and improvedthe bottom-up based subspace clustering methods, we proposed a novel subspace clus-tering method based on kernel density estimation and the intensive experiments showedthe superior efectiveness and efciency of our proposed method. The main contents andcontributions can be summarized as follows:1. We firstly introduced the subspace clustering problem for high-dimensional dataand then studied the bottom-up based subspace clustering algorithms in depth. In the endof chapter2, the density divergence problem is introduced for further study.2. We proposed the kernel density estimation based on subspace clustering algorith-m to efectively address the dilemma of grid partition and the density divergence problem.Some related techniques are first introduced and the basic terms and definitions are de-fined. Subsequently, the detailed algorithm is explicitly described in the end of chapter3.3. We conduct intensive experiments on both synthetic and real datasets and theperformance comparisons on algorithm scalability, accuracy and efciency with existingsubspace clustering algorithms show the superiority of our proposed algorithm. 4. Finally our visions for distributed concurrency framework and extending ouralgorithm to combine numerical and categorical attributes are presented in conclusion.

Keywords/Search Tags:

High-dimensional data, Clustering analysis, Subspace clustering, Kernel density estimation

PDF Full Text Request

Related items

1	Research Of Subspace-clustering Algorithms Based On Density Over High-dimensional Data
2	Research On Subspace Clustering Algorithms Based On Density
3	Study On High-dimensional Data Subspace Clustering Analysis And Application
4	Research On High Dimensional Data Clustering Algorithm Based On Subspace And Density Peak
5	Research On Subspace Clustering Algorithm For High Dimensional Data
6	Research On Key Technologies Of Clustering High-dimensional Data Based On Sparse Subspace And Their Applications
7	Study On Clustering For Large Data Sets And Its Applications
8	Research On Improved Subspace Clustering Algorithm
9	Research On Clustering Algorithms For High-Dimensional Data
10	Application Of Grid And Density Based Clustering Algorithm In Data Mining