Font Size: a A A

Two Ball-tree Based Optimizations To Speed Up Maximum Inner Product Search

Posted on:2016-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y K ZhangFull Text:PDF
GTID:2308330479982179Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Maximum inner product search is a relatively new problem, which, however, plays a very important role in many real applications, such as collaborative filter-ing based on matrix factorization, document retrieval, maximum kernel search. Formally, maximum inner product search problem is defined as follows:Given a dataset D(>)Rd of n vectors, a query vector q ∈Rd, efficiently find a vector v*∈D such that where <*,*> denotes the inner product between two vectors.Due to its importance, large amount of literature has focused on addressing the problem efficiently. Among them, ball-tree, which is efficient, simple, and easy to implement, is a popular structure for maximum inner product search. By profiling the ball-tree search routine, we find that more than 90% of search time is cost by inner product computations, which is surely an obstacle to improv-ing the performance of the ball-tree. There are two sources where inner product computations come from:(1) An inner product evaluation is required to compute an upper bound for a node. We denote # of inner product evaluations for upper bound computation as XN. (2) For each point in a visited leaf, we need to evaluate an inner product between the point and the query. We denote # of inner product evaluations of this kind as Xv. Obviously, if we are able to reduce XN and Xv,we then are able to speed up the maximum inner product search.By taking advantage of some properties of inner product, we propose two simple but efficient optimizations, reducing XN and Xv respectively thus speed-ing up the ball-tree.1. Point-level Pruning (PLP). The ball-trees compute an upper bound for each visited node and use this bound to prune unnecessary points. This is called node-level pruning. In this paper, we propose the point-level pruning, which computes an upper bound for each visited point before evaluating the exact inner product between the point and the query, attempting to prune the point. Besides the natural point-level ball bound (PLP-Ball), we further propose a point-level LA bound (PLP-LA) and prove theoretically that PLP-LA is a tighter bound than PLP-Ball. This optimization reduces Xv a lot.2. Collaborative Upper Bound Computing (CUBC). In this optimization, we apply the linearity of inner products to the process of computing upper bounds for nodes, reducing XN by half.Besides the two optimizations mentioned above, based on PLP-LA, we in-troduce the concept of shift-sensitivity property. For PLP-LA, after we shift the reference set D(i.e., add to each point in D a fixed shift vector s), the gap be-tween the upper bound and the exact inner product will change. The result of the MIP problem, however, remains the same after the shifting. This property can help optimize the PLP-LA upper bounds and thus speed up the maximum inner product search. We prove that PLP-Ball is shift-insensitive and provide a corollary that the original ball-trees are shift-insensitive, too.We experimentally verify the effectiveness and efficiency of our optimiza-tions on two large public collaborative filtering datasets-Netflix and Yahoo! Mu-sic. The result shows that a ball-tree integrated with our optimizations performs much better than other exact methods, including the original ball-tree, cover-tree.
Keywords/Search Tags:Maximum inner product search, Ball-trees, Shift-sensitivity
PDF Full Text Request
Related items