Font Size: a A A

Design And Implementation Of Optimizer For Distributed Query Processing On Graph Store

Posted on:2020-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y YaoFull Text:PDF
GTID:2428330620958861Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Graph structure is useful for relationship mining of big data.It becomes more and more popular to use graph structure to represent all kinds of data.RDF systems are widely used to store public knowledge bases and to process SPARQL queries.A large number of such systems have been proposed in the recent literature to provide low latency and high throughput for concurrent query processing over large RDF data.The query optimizer is the critical component of RDF systems.Advanced software and hardware techniques pose new challenges for designing and implementing an efficient optimizer.The query optimizer consists of three components:cardinality estimation,cost model and plan enumeration.The three components work together to provide complete service for clients,each of one play an important role in the optimizer.We find many deficiencies of traditional scan-join based optimizer through detailed evaluation:First,in the cardinality estimation,scan-join approach adopts estimation method based on two-correlated predicates'relationship and ignores the dependencies between multiple predicates,and also the cost model of scan-join no longer works for graph-exploration based systems.Second,it is quite difficult or impossible to attach scan-join's cost model to the graph-exploration approach directly.Targeting these problems,we propose Wukong+Tr,a full-history graph-exploration based optimizer,which is more suitable for new system Wukong[1]and improve the estimation accuracy and overall performance.However,we find some remaining problems of Wukong+Tr by testing the three components of it.We revealed the common deficiencies of two-correlated based optimizer:In cardinality estimation,the two-correlated based optimizer could produce large errors due to ignoring multiple relationships.In cost model,fine-grained cost model is needed to fit on full-history based graph-exploration.In plan enumeration,with the dramatical performance improvement by leveraging advanced software and hardware techniques,the optimization time may occupy a notable fraction of the overall query time for selective queries.To solve the above problems,we introduce a type-centric approach called Wukong+P,which is a further optimization on Wukong+Tr.Based on the observation that the same type of vertices commonly has a similar combination of predicates,Wukong+P embeds the lineage of correlated predicates'dependencies?triple patterns?into existing type system of RDF data to enhance the accuracy of cardinality estimation.Further,Wukong+P uses a fine-grained cost model to consider more details in the graph-exploration process.Last,Wukong+P adopts differentiated optimization methods for light and heavy queries to largely reduce the optimization time.Evaluation shows that Wukong+P greatly improves the accuracy of cardinality estimation by several magnitudes compared to state-of-the-art approaches.In cost model,the trend of estimated cost and real time is quite similar,which means the high accuracy of cost estimation.In plan enumeration,Wukong+P reduce the optimization time of selective queries to 0.01milliseconds.In the overall performance,Wukong+P improves the overall performance by up to 2.32 times.The average gap between optimal and selected plans is limited to about 2%.
Keywords/Search Tags:Optimizer, Graph Query Processing, Type-centric Estimation, Cost Model
PDF Full Text Request
Related items