Constraint Based Subspace Clustering For High Dimensional Uncertain Data

Posted on:2017-11-17

Degree:Master

Type:Thesis

Country:China

Candidate:L Gao

Full Text:PDF

GTID:2348330533950669

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of the society and the progress of science, data gradually becomes larger, higher dimensional and more diverse, which makes it more and more difficult to extract the information from the data.Clustering for uncertain data is a big challenge in data mining research. The usual algorithms which rely on precise data cannot deliver high-quality patterns for uncertain data as uncertain data usually is a distribution that meets a certain probability density function. In addition, the clustering for high dimensional data is another big problem for data mining algorithms. Its two major difficulties: sparse and dimension disasters makes the usual algorithms inefficient. To solve this problem, subspace clustering was proposed. It makes difference in both finding clusters and finding relevant dimensions for each cluster.It is even more challenging for clustering high dimensional uncertain data and there are few such algorithms. What's more, to our knowledge, there is only one algorithm based on a bottom-up subspace clustering algorithm. In this paper, based on the classical FINDIT(a fast and intelligent subspace clustering algorithm using dimension voting) subspace clustering algorithm for high dimensional data, we propose a constraint based semi-supervised subspace clustering algorithm for high dimensional uncertain data: UFINDIT. Our major contribution is that we propose a top-down uncertain subspace clustering algorithm which can solve high dimensional uncertain data clustering problems effectively, what's more, it has a high accuracy and good scalability. Details are as follows : We extend both the distance functions and dimension voting rules of FINDIT to deal with high dimensional uncertain data; Since the soundness criteria of FINDIT fails for uncertain data, we introduce constraints to solve the problem; We also use the constraints to improve FINDIT in eliminating parameters' effect on the process of merging medoids; Furthermore, we propose some methods such as sampling to get an more efficient algorithm. Experimental results on synthetic and real data sets show that our UFINDIT algorithm outperforms the existing subspace clustering algorithm for uncertain data.

Keywords/Search Tags:

Subspace, Constraints, Clustering, Uncertain, High Dimension

PDF Full Text Request

Related items

1	Subspace Clustering Method Of High Dimensional Data
2	Subspace Clustering Based On Dimension-oriented Distance And Its Applications
3	Semi-supervised Subspace Clustering Based On Space-level Constraint
4	Research On Subspace Clustering Algorithm Based On Finding Effective Dimension
5	Research On Clustering Algorithm Based On Irregular Grid And Subspace Of Descending Dimension
6	Research On Improved Sparse Subspace Clustering Algorithm
7	Unsupervised Clustering Algorithm Based On Dimension Reduction
8	Using A Weighted Network Graph Clustering And Subspace Ensemble Approach For High-dimension Data Classification
9	Clustering Algorithms Analysis On Data Dimension
10	Research And Application Of Soft Subspace Clustering Algorithms