Two Ball-tree Based Optimizations To Speed Up Maximum Inner Product Search

Posted on:2016-04-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y K Zhang

Full Text:PDF

GTID:2308330479982179

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Maximum inner product search is a relatively new problem, which, however, plays a very important role in many real applications, such as collaborative filter-ing based on matrix factorization, document retrieval, maximum kernel search. Formally, maximum inner product search problem is defined as follows:Given a dataset D(>)Rd of n vectors, a query vector q ∈Rd, efficiently find a vector v*∈D such that where <*,*> denotes the inner product between two vectors.Due to its importance, large amount of literature has focused on addressing the problem efficiently. Among them, ball-tree, which is efficient, simple, and easy to implement, is a popular structure for maximum inner product search. By profiling the ball-tree search routine, we find that more than 90% of search time is cost by inner product computations, which is surely an obstacle to improv-ing the performance of the ball-tree. There are two sources where inner product computations come from:(1) An inner product evaluation is required to compute an upper bound for a node. We denote # of inner product evaluations for upper bound computation as XN. (2) For each point in a visited leaf, we need to evaluate an inner product between the point and the query. We denote # of inner product evaluations of this kind as Xv. Obviously, if we are able to reduce XN and Xv,we then are able to speed up the maximum inner product search.By taking advantage of some properties of inner product, we propose two simple but efficient optimizations, reducing XN and Xv respectively thus speed-ing up the ball-tree.1. Point-level Pruning (PLP). The ball-trees compute an upper bound for each visited node and use this bound to prune unnecessary points. This is called node-level pruning. In this paper, we propose the point-level pruning, which computes an upper bound for each visited point before evaluating the exact inner product between the point and the query, attempting to prune the point. Besides the natural point-level ball bound (PLP-Ball), we further propose a point-level LA bound (PLP-LA) and prove theoretically that PLP-LA is a tighter bound than PLP-Ball. This optimization reduces Xv a lot.2. Collaborative Upper Bound Computing (CUBC). In this optimization, we apply the linearity of inner products to the process of computing upper bounds for nodes, reducing XN by half.Besides the two optimizations mentioned above, based on PLP-LA, we in-troduce the concept of shift-sensitivity property. For PLP-LA, after we shift the reference set D(i.e., add to each point in D a fixed shift vector s), the gap be-tween the upper bound and the exact inner product will change. The result of the MIP problem, however, remains the same after the shifting. This property can help optimize the PLP-LA upper bounds and thus speed up the maximum inner product search. We prove that PLP-Ball is shift-insensitive and provide a corollary that the original ball-trees are shift-insensitive, too.We experimentally verify the effectiveness and efficiency of our optimiza-tions on two large public collaborative filtering datasets-Netflix and Yahoo! Mu-sic. The result shows that a ball-tree integrated with our optimizations performs much better than other exact methods, including the original ball-tree, cover-tree.

Keywords/Search Tags:

Maximum inner product search, Ball-trees, Shift-sensitivity

PDF Full Text Request

Related items

1	Research On Approximate Nearest Neighbor Search And Maximum Inner Product Search For High-dimensional Dataset
2	Concurrent binary search trees: Design and optimizations
3	Analytic and adaptive techniques for improving nearest-neighbor search performance in k-dimensional trees
4	A Fixed-parameter Algorithm For The Maximum Agreement Forest Problem On Multifurcating Trees
5	Improved Maximum Doppler Frequency Shift Estimation Method And Its Application
6	The Design And Implementation Of Product Name Analysis Subsystem Of Product Search System
7	Concurrency in B-trees and tries: Search and insert
8	Improvement And Optimization Of Algorithms For Constructing Phylogenetic Trees Based On Maximum Parsimony
9	Design And Implementation On Vertical Product Search System For Portal
10	The Design And Implementation Of Personalized Search System Of Product