Research Of High-Dimensional Space Join And Query Algorithms Based On Main-Memory

Posted on:2012-04-12

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Liu

Full Text:PDF

GTID:1268330428960965

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Many emerging database applications such as CAD, multimedia, medical image, time series, molecular biology and scientific databases represent their data as multi/high-dimensional feature vectors. Each feature vector consists of d values which can be interpreted as coordinates in a d-dimensional space plus some associated content data. The distance between two feature vectors is commonly used to measure the degree of (dis-)similarity between the original high-dimensional objects (with regard to the feature represented). There is increasing need to combine similar tuples (points) of two datasets for many applications.The operation of generating all pairs is in essence high-dimensional join (high-dimensional join is a general designation of all kinds of similarity join).So far most of researches focus on the execution of high-dimensional joins over large amounts of disk based data. The increasing sizes and lower price of main memory available on current computers, and the need for efficient processing of high-dimensional joins suggest that high-dimensional joins for a large class of problems can be processed in main memory. Î”-tree is a novel multi-level index structure, it can speed up the high-dimensional query in main memory environment and has been proven to be an efficient index method. Therefore, in order to meet different application needs, several kinds of high-dimensional joins and queries were designed based on index structure Î”-tree. The main work of this topic can be summarized as follows:Firstly, the efficient main-memory index Î”-tree and the characteristics of similarity join were studied, the pruning strategies of the nodes in Î”-tree were analyzed. Made full use of the properties of Î”-tree, A novel main-memory similarity self-join algorithm Î”-tree-Join was presented for a single dataset. Used the properties of cluster overlap and taked the center of nodes in Î”-tree as the center of clusters, main-memory index structure Î”-tree-S was designed.A main- memory similarity join algorithm called Î”-tree-Join*adopted the top-down join scheme for combining two different database sets was presented based on A-tree-S. The two algorithms computed the distances between clusters and between point and cluster with fewer number of dimensions, so as to filter unnecessary nodes or points, reduce computations and improve main-memory join efficiency fundamentally. Experimental results indicate that these two join algorithms have high join efficiency.Secondly, the feature that the searching radius of kNN query is unknown apriori was analyzed, the algorithm of high-dimensional kNN(k Nearest Neighbor) query in main-memory was studied. Three kinds of main-memory kNN query algorithms based on A-tree for high dimensional data were developed. Preformace analysis and experiments prove that these three algorithms improve query efficiency compare to existing algorithm, among them the third query algorithm named BU_DF_knn_Search has the best efficiency. This algorithm determines the membership leaf node of query point in A-tree through the definition of membership node firstly, and searches kNN in this node, narrows the pruning distance quickly using near optimal kNN candidates, then traverses the subtrees used the ancestors of membership leaf nodes as root nodes by depth first manner from bottom to top.Thirdly, the properties of kNN join was analyzed, the algorithm of kNN join in main-memory was studied. According to requirements index structure A-tree-R for main-memory kNN join was proposed, every generated node in this index was coded. A main-memory kNN join algorithm was presented based on Î”-tree-R and Î”-tree-S, this algorithm decodes in Î”-tree-S by using the code of A-tree-R, fast locates positional correspondence nodes, uses the near optimal kNN candidates in them and adopts the traversal order of combining bottom up and depth first search, narrows kNN pruning distances rapidly. Experiments results illustrate that Î”-tree-KNN-Join is an efficient kNN-join method in main memory.Fourthly, the solutions of RkNN (Reverse k Nearest Neighbor) query and the characteristics of high-dimensional RkNN query were analyzed, the conclusion that choose self-pruning approaches to process high-dimensional RkNN query was drawed. An index structure called Î”-Rdknn-tree was proposed based on Î”-tree-R, precomputed kNN distances of points in the dataset by main-memory kNN self-join algorithm Î”-tree-KNN-Join based on this index and propagated these distances to higher level index nodes. Main-memory RkNN query algorithm based on this index was proposed.Fifthly, the problem of set-oriented RkNN queries in main-memory enviroment namely main-memory RkNN join was studied. Constructed indexes Î”-Rdknn-tree and Î”-tree for the two datasets that were going to join, used the RkNN searching distance information of nodes in Î”-Rdknn-tree,main-memory RkNN join algorithm was designed based on the similarity join algorithm Î”-tree-Join*, performance analysis was conducted for the effectiveness of this algorithm.All the main-memory query and join algorithms in this paper were certificated by theories and experiments, have practical value and lay the theoretical and practical foundation of the research for all kinds of high-dimensional queries and joins algorithms in main memory.

Keywords/Search Tags:

main-memory high-dimensional join, similarity join, k nearestneighbor query, k nearest neighbor join, reverse k nearest neighbor join

PDF Full Text Request

Related items

1	Research On Spatial Join And Variants Of Nearest Neighbor Query
2	LSH-based Similarity Search Of Web Data
3	Optimization Of Database Join Algorithms On DRAM/NVM-Based Hybrid Memory
4	Aggregate Queries On Constrained Probabilistic Similarity Join Pairs
5	Research On Spatio-textual K Nearest Neighbor Joins
6	Research Of Interval Join Method Base On MapReduce
7	Research On Improvement Of Similarity Join In MapReduce
8	Research Of Query Optimization Based On Join Index
9	Research On Optimization For Multi-way Join In A Map-Reduce Environment
10	Research And Implementation Of High Efficiency Set Similarity Join Algorithm Based On Overlap Similarity