Research Of High Dimensional Retrival Algorithms Based On Adaptive Cluster Distance Bounding

Posted on:2013-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:J Guo

Full Text:PDF

GTID:2248330395473355

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the fast development of the high dimensional databases, the size of the high dimensional database is becoming increasing. Now, reasearchers have proposed a lot of high dimensional index structures in order to accelerate the query efficiency. But most of these structures have two defects as follows:first, most of them are based on an unreasonable assumption that data are uniformly distributed and have no connections among dimensions. Second, the query efficiency based on these structures are becoming lower with the increase of the dimension size and a research has improved that the "curse of dimensionality" occurs when the dimensionaliry is more than10. A popular and effective technique to overcome the curse of dimensionality is the vector approximation file, but the acceleration performance of it is limited in a small range, moreover, this structure is also based on the assumption that the data are uniformaly distributed, with independent attributes. All these cannot satisfy the practical use.The adptive cluster distance bounding (ACDB) method for high-dimensional indexing is based on the assumption that the data are non-uniformly distributed and have some connections among dimensions for real datasets. Comparied with other high dimensional index structures, the method can solve practical problems better. Moreover, the indexing method is using the Voronoid clustering, which allows for more general convex polygon structures that are able to efficiently bound the cluster surface. In2011, some researchers proposed the KNN algorithm based on ACDB. This algorithm has reduced the I/O times by filtering out irrelevant clusters though a lower value of the query-cluster distance. Reasearchers have proved that this algorithm based on ACDB is better at reducing the I/O numbers and the query performance than the sequencial query, such as VA-File, iDistance, LDC and so on. But with the increasement of the dimensionality and data size, the query efficiency is becoming lower and lower, so the KNN algorithm based on ACDB is still needed to be optimized. Hence, our main work and results are as follows: First, we investigated the-state-of-art algorithms about the high-dimensional indexing in the field of ACDB, and found the defects in the KNN algorithm based on ACDB.Second, with extra storage requirements and pretreatment, we proposed an IV-KNN search algorithm by reducing unnecessary distance calculations and CPU costs according to the triangle inequality formula. Experiment results based on real data sets proved that the efficiency of IV-KNN search algorithm is much better than V-KNN.Finally, to analyze the current approximate similarity retrival methods, we propose a RDFV-KNN algorithm by both applying the PCA and LLE dimension reduction technology. From experiments of real datasets, the efficiency and accuracy of RDIV-KNN algorithm has been improved straightforward.

Keywords/Search Tags:

high-dimensional indexing structure, adaptive cluster distance bounding, KNNquery, approximate similarity search, dimension reduction

PDF Full Text Request

Related items

1	Research Of Indexing Technology In Medical Image Retrieval Platform
2	Search and indexing of high-dimensional feature spaces for similarity retrieval
3	Representing And Indexing Trajectories For Efficient Similarity Search With TRI Framework
4	Hash-based Approximate Nearest Neighbor Search For High-dimensional Data
5	A Research On The Dimension Reduction And Similarity Search For The Interval Time Series
6	Research On Dimensionality Reduction And Quantification Methods Of Approximate Nearest Neighbor Query For Streaming Data
7	Dimension Reduction And Clustering For High-Dimensional Data
8	Research On Vector Approximation Method In High-dimensional Index Technology
9	Similarity Search On Large-scale High-dimensional Data
10	Research On Dimension Reduction Methods Of High Dimensional Data