Design And Implementation Of Optimizer For Distributed Query Processing On Graph Store

Posted on:2020-04-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Yao

Full Text:PDF

GTID:2428330620958861

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Graph structure is useful for relationship mining of big data.It becomes more and more popular to use graph structure to represent all kinds of data.RDF systems are widely used to store public knowledge bases and to process SPARQL queries.A large number of such systems have been proposed in the recent literature to provide low latency and high throughput for concurrent query processing over large RDF data.The query optimizer is the critical component of RDF systems.Advanced software and hardware techniques pose new challenges for designing and implementing an efficient optimizer.The query optimizer consists of three components:cardinality estimation,cost model and plan enumeration.The three components work together to provide complete service for clients,each of one play an important role in the optimizer.We find many deficiencies of traditional scan-join based optimizer through detailed evaluation:First,in the cardinality estimation,scan-join approach adopts estimation method based on two-correlated predicates'relationship and ignores the dependencies between multiple predicates,and also the cost model of scan-join no longer works for graph-exploration based systems.Second,it is quite difficult or impossible to attach scan-join's cost model to the graph-exploration approach directly.Targeting these problems,we propose Wukong+Tr,a full-history graph-exploration based optimizer,which is more suitable for new system Wukong^[1]and improve the estimation accuracy and overall performance.However,we find some remaining problems of Wukong+Tr by testing the three components of it.We revealed the common deficiencies of two-correlated based optimizer:In cardinality estimation,the two-correlated based optimizer could produce large errors due to ignoring multiple relationships.In cost model,fine-grained cost model is needed to fit on full-history based graph-exploration.In plan enumeration,with the dramatical performance improvement by leveraging advanced software and hardware techniques,the optimization time may occupy a notable fraction of the overall query time for selective queries.To solve the above problems,we introduce a type-centric approach called Wukong+P,which is a further optimization on Wukong+Tr.Based on the observation that the same type of vertices commonly has a similar combination of predicates,Wukong+P embeds the lineage of correlated predicates'dependencies?triple patterns?into existing type system of RDF data to enhance the accuracy of cardinality estimation.Further,Wukong+P uses a fine-grained cost model to consider more details in the graph-exploration process.Last,Wukong+P adopts differentiated optimization methods for light and heavy queries to largely reduce the optimization time.Evaluation shows that Wukong+P greatly improves the accuracy of cardinality estimation by several magnitudes compared to state-of-the-art approaches.In cost model,the trend of estimated cost and real time is quite similar,which means the high accuracy of cost estimation.In plan enumeration,Wukong+P reduce the optimization time of selective queries to 0.01milliseconds.In the overall performance,Wukong+P improves the overall performance by up to 2.32 times.The average gap between optimal and selected plans is limited to about 2%.

Keywords/Search Tags:

Optimizer, Graph Query Processing, Type-centric Estimation, Cost Model

PDF Full Text Request

Related items

1	Research On Query Processing Of Graph-Structured XML Data
2	Research And Realization Of Some Critical Technologies For Interactive Query Optimizer
3	Research On Data Query Processing And Optimization In Distributed Database
4	Cost-Based Optimizbation Model For Stream Computing Language
5	Design And Implementation Of Cost Estimation Model In Da Meng DBMS
6	Graph-based data analysis: Tree-structured covariance estimation, prediction by regularized kernel estimation and aggregate database query processing for probabilistic inference
7	Research On Key Techniques Of Query Processing Over Large-scale Graph Data
8	Research On Query Processing Technologies Over Large Scale Knowledge Graphs
9	Data Flow In Adaptive Query Processing Mechanism
10	Subgraph Query Process On Graph-Structure Data