Font Size: a A A

Constraint Based Subspace Clustering For High Dimensional Uncertain Data

Posted on:2017-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:L GaoFull Text:PDF
GTID:2348330533950669Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the society and the progress of science, data gradually becomes larger, higher dimensional and more diverse, which makes it more and more difficult to extract the information from the data.Clustering for uncertain data is a big challenge in data mining research. The usual algorithms which rely on precise data cannot deliver high-quality patterns for uncertain data as uncertain data usually is a distribution that meets a certain probability density function. In addition, the clustering for high dimensional data is another big problem for data mining algorithms. Its two major difficulties: sparse and dimension disasters makes the usual algorithms inefficient. To solve this problem, subspace clustering was proposed. It makes difference in both finding clusters and finding relevant dimensions for each cluster.It is even more challenging for clustering high dimensional uncertain data and there are few such algorithms. What's more, to our knowledge, there is only one algorithm based on a bottom-up subspace clustering algorithm. In this paper, based on the classical FINDIT(a fast and intelligent subspace clustering algorithm using dimension voting) subspace clustering algorithm for high dimensional data, we propose a constraint based semi-supervised subspace clustering algorithm for high dimensional uncertain data: UFINDIT. Our major contribution is that we propose a top-down uncertain subspace clustering algorithm which can solve high dimensional uncertain data clustering problems effectively, what's more, it has a high accuracy and good scalability. Details are as follows : We extend both the distance functions and dimension voting rules of FINDIT to deal with high dimensional uncertain data; Since the soundness criteria of FINDIT fails for uncertain data, we introduce constraints to solve the problem; We also use the constraints to improve FINDIT in eliminating parameters' effect on the process of merging medoids; Furthermore, we propose some methods such as sampling to get an more efficient algorithm. Experimental results on synthetic and real data sets show that our UFINDIT algorithm outperforms the existing subspace clustering algorithm for uncertain data.
Keywords/Search Tags:Subspace, Constraints, Clustering, Uncertain, High Dimension
PDF Full Text Request
Related items