Font Size: a A A

Learning And Indexing Structural Representations Of Large Scale High Dimensional Data

Posted on:2020-04-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:S C LiuFull Text:PDF
GTID:1368330623463941Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently,multimedia information grows exponentially on the Internet.People have developed a huge amount of applications around the massive multimedia data.Fast indexing techniques are enssential to these applications,which is basically searching in large scale high dimensional data.Large scale high dimensional data contains semantically rich information while exhibiting little information for structral organization makes traditional searching methods less effective.Thus establishing an effective searching method for large scale high dimensional data is an emerging topic in recent years.Among those efforts,hashing methods and quantization methods are the most promising techniques to solve the aforementioned issues.However,the scale of real-life large scale data could easily reach billions,while existing hashing and quantization methods doesn't allow effective hierarchical data-structures for indexing.This issue inherently lowers the memory efficiency and search performances.On the other hand,searching in multimedia data usually aims at retriving semantically revalent items,when incoperating large scale data with searching methods not optimized for semantical search,the search result quickly degenerates and becomes useless for application.To this end,we proposed structral embeddings,it seeks to use quantization encodings to build a tree for seaching.Based on existing quantization methods,we propose generalized residual vector quantization.Based on deep neural networks,we propose three methods for simultaneous deep representation and quantization codebook learning.Finally,we present Toolset for high dimensional data.The novelty of our contributions are listed as follows:1.We propose the concept of structral encoding.We use the quantization encodings to build a tree for searching.We use infomation theory to analyse how the structure of tree influence the search efficiency of high dimensional data.Based on the study,we proposed guidelines of designing quantization methods for very large scale data.2.We propose generalized residual vector quantization.Traditional residual vector quan-tization suffers from increasing outliers when appling quantization several times.We propose to generate a intermediate dataset with residual vectors and a selected codebooks to apply quantization.We also designed a multi stage clustering algorithm.Experimental results shows generalized residual vector quantization outperforms the state-of-the-art on quantization error and searching results.Our method can be easily extend to an online version for large scale data or ever-chaning data.3.We propose aggregating tree for non-exhaustive search.We use encodings generated with generalized residual vector quantization to build a radix-tree like data structure,and use an efficient beam search algorithm to retrieve the nearest neighbors.Empirical analysis of parameter selection are performed.Searching performance tests show aggregating tree achieves better trade-offs between search time and seach accuracy.4.We propose three methods for simultaneous deep representation and quantization codebook learning.Recent advances in deep learning shows deep representations trained with a siamese network architecture outperforms traditional hand crafted visual features for semantical search.However simultaneous deep representation and quantization codebook learning remains a chanllenging task.We propose a novel loss function by thresholding back-propagating gradients named diode loss,a novel training method which apply momentum on output value named space shuttle model,and a novel network layer which performs gradient transformation named gradient snapping layer.We analyse the methods through experiments and show the tricks for network design.
Keywords/Search Tags:Large scale data, high dimensional data, large scale search, quantization, nearest neighbor search, deep representation learning, multitask learning, semantical search
PDF Full Text Request
Related items