Data Structure Complexity For Big Data:Parallel And Succinct Data Structures

Posted on:2021-10-30

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M M Liu

Full Text:PDF

GTID:1488306500467604

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Data structure is a cornerstone of computer science.In reality,data structures support the operations of various computing devices;on the other hand,data structure is a basic course for computer science.Nowadays,the big data puts forward higher requirements for data structures.We not only want the data structure algorithms to be faster,but also hope the data structures consume less space.This thesis studies the data structure from two aspects.For the easier low-dimensional data structure problems,we consider succinct data structures,that is to say,to solve the problem with the space usage close to the information-theoretical lower bound.For the harder high-dimensional data structure problems,we study how to accelerate the oper-ations with parallel computing;we also try to prove lower bound for high-dimensional data structure problem to show that there does not exist efficient data structure algo-rithm,approaching the proof of“curse of dimensionality”.The thesis consists of two parts.The first part focuses on the succinct data structures for low-dimensional data structure problems.First,we prove lower bound for the range minimum query problem.Our lower bound is the first lower bound for the problem,and the lower bound shows that the state-of-the-art algorithm is optimal in the regime of constant time.We refine the state-of-the-art lower bound technique,so that we not only can prove the average case lower bound,but also can prove the higher worst-case lower bound.Then we consider the dynamic approximate set membership query（Bloom filter）problem.We consider a setting that is important in practice and is interesting but rarely considered by the theory community:the space usage of the dynamic data structure can be dynamically resized,and is required to depend only on the size of the current database,rather than the maximum size estimated in advance.On the one hand,it saves considerable space;on the other hand,it represents the important idea of“scalability”in practice.In this setting,we present the best data structure algorithm in all senses for the dynamic approximate set membership query:In terms of space,it is the only succinct data structure that solves this problem in the natural case of n?logu;in terms of time,all the update and query operations can be done in the worst-case constant time.Our data structure is a positive answer for an open problem left by Pagh,Segev and Wieder（FOCS 2013）.Finally,we consider another fundamental data structure problem,the dictionary problem.We come up with a brand-new idea to deal with the“overflow”issue in the classic dictionary data structures which causes waste of space.With this strategy,in the commonplace case u=poly n,our extra space overhead is reduced from≈O(n（log n）^2/3)bits to O（n log...log n）bits,while also ensuring that both update and query operations can be done in the worst-case constant time.Our data structure is a positive answer for an open problem left by Arbitman,Naor and Segev（FOCS 2010）.The second part focuses on the nearest neighbor search problem,the central prob-lem of high-dimensional data structure problems.We first consider the randomized approximate nearest neighbor search in parallel,and prove the asymptotically optimal trade-off between parallelism and memory ac-cess in the polynomial space for this problem.In particular,we present efficient data structure algorithms and prove lower bound.The data structure upper bounds use the classical linear dimensionality reduction method,and the multi-way search.Our lower bound is proven by constructing a communication protocol with the data structure,then using the round elimination technique to prove the lower bound in the communication model.Then we try to prove lower bound for theλ-near neighbor problem with the state-of-the-art data structure lower bound technique.We first construct a communication protocol that characterize the cell-sampling technique,showing that we can apply cell-sampling technique in a simpler way with this protocol.Then we present a counter-example,a communication protocol,to show that it is not an easy task to prove lower bound for theλ-near neighbor problem with the state-of-the-art technique,the cell-sampling.Finally,we consider a computational model which can be thought of comprising all hashing-based algorithms.By applying the analysis of boolean functions in a simpler and straightforward way,we simplify the proof of the lower bound forλ-near neighbor search problem.

Keywords/Search Tags:

Data Structure, Range Minimum Query, Bloom Filters, Dictionaries, Nearest Neighbor Search

PDF Full Text Request

Related items

1	Research On K Nearest Neighbor Query Based On Data Transformation
2	Study On K Nearest Neighbor Query Based On Road Networks In Mobile Database
3	The Research Of K-nearest Neighbor Query Of Spatial Data Based On Voronoi Diagram
4	Studies On Probabilistic Nearest Neighbor Queries Over Uncertain Data
5	LSH-based Similarity Search Of Web Data
6	Research On The Visual Group K-Nearest Neighbor And Group Inverse K-Nearest Neighbor Query Of Multi-Source Objects In Three-Dimensional Space
7	Efficient Algorithms For Approximate Aggregation And Nearest Neighbor Queries Over Multi-Dimensional Data
8	Research On Spatial Join And Variants Of Nearest Neighbor Query
9	Research On Spatial Data Index Based On Nearest Neighbor Distance And Query Algorithms
10	Research Of Reverse Nearest Neighbor Query In Spatial Database