Font Size: a A A

Research On High - Dimensional Data Tree Index Based On Soft Subspace

Posted on:2016-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ShenFull Text:PDF
GTID:2208330470970581Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
High-dimensional indexing technology is based on the content of a fundamental problem in the field of similarity retrieval research, which has been paid great attention on by researchers for years. In order to reduce the query overhead and accelerate the similarity search in high dimensional space, the commonly used method is to design a high-dimensional indexing to support this type of query. Many researchers work on the indexing technology research,and its main content focuses on the appropriate index dimension reduction and tries to establish a good index method. The clustering algorithm is a commonly used technique for processing huge amounts of data queries and data statistics, and the soft subspace clustering algorithm is an important branch of feature selection and feature conversion problems, which can improve the clustering efficiency of high-dimensional data sets.This paper designed and implemented the high-dimensional data clustering algorithm based on subspace, and on this basis created a kind of indexing tree suitable for subspace clustering. In this paper, firstly the subspace clustering problems were introduced, after the theory of feature selection was studied to improve a sound soft subspace clustering algorithm, and the different clusters hidden in different subspace were found out. Secondly based on the space distribution of the subspace and clustering clusters a high dimensional space partitioning strategy was proposed, and this strategy could help set up the indexing tree structure for subspace clustering in combination with the regional coverage. Finally, in order to improve the query efficiency of high-dimensional data, a filter query algorithm working on the indexing tree structure was put forward.After the above studies, through the experiments on different sizes of artificial experimental data and real data sets, the experimental results proved that the high-dimensional data indexing tree built on the basis of the soft subspace clustering algorithm can improve the query efficiency of high-dimensional data.
Keywords/Search Tags:High-dimensional data, Soft subspace clustering, Feature selection, Index of the tree
PDF Full Text Request
Related items