Font Size: a A A

Research On Top-k Query Algorithms On Graph Data

Posted on:2017-05-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:C LanFull Text:PDF
GTID:1318330536458714Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Graph is a widely used data structure for its flexibility to model complex relations between objects.As more and more graph data appear in both research and industrial area,especially with the blooming of social network and knowledge graph,how to efficiently manage and query graph data has drawn a lot of attentions in data management area.Top-k querying on graph data is one of the most important topics among them because querying the most relevant k subgraph-matches or vertexes and edges sets from graph data is the basic operation of many related applications.Based on the fact that current graph data often have large volume,numerous kinds of vertexes and edges,constantly changing contents,etc.,we focus on three key problems:top-k graph querying on distributed graph data,top-k graph querying from multiple graph data sources,and consistently relevant top-k query on dynamic data.We summarize the existing works,reveal their strengths and weaknesses,and propose some new algorithms for top-k querying on graph data.The main contributions of the dissertation are as follows:· We propose a star-query decomposition based algorithm for top-k subgraph query on distributed graph data.Querying on distributed graph using existing methods can cause a lot of network communication costs due to inter-sites data transferring.We leverage the idea of 'divide and conquer' and propose a algorithm consisting of three stages:star query decomposition,sub-query execution,and results joining to reduce the cost of querying and data transferring,as well as to guarantee the correctness of results.We propose a heuristic star-query decomposition method and three optimizations for the star-query algorithm to achieve lower costs in both local querying and results joining processes.The result of experiments shows that the algorithm we proposed can solve the top-k subgraph query on distributed graph data efficiently with good scalability.And the optimizations techniques can further reduce the latency by 30%to 40%.· We propose a rank join algorithm for top-k subgraph query from multiple graph data sources.Existing relational data based methods such as HRJN are not designed for graph data.We propose a new rank join algorithm for top-k graph query,which consists of three major steps:constructing input object lists from each data source,accessing objects from the input lists and joining them to form complete results with scores,and checking if the stopping condition of the algorithm is satisfied.We further propose a optimized ranking function for generating the input lists in the first step to achieve a tighter bound of the unseen results.We reveal the optimal condition of parameters in the optimized ranking function and theoretically prove its correctness.We propose a heuristic strategy to generate input lists to overcome the difficulties to select the optimal parameters in real applications.We also propose a optimal input lists selection strategy based on the idea of minimum dropping score.The experiment result shows that the algorithm we propose achieves lower costs than the existing methods.The minimum dropping score strategy outperforms the baseline method about 10%to 20%.The cost of the algorithms using optimized ranking function with arbitrary fixed parameter combinations are about 20%to 30%lower than the baseline and the heuristic strategy can reduce the cost about 50%to 70%.· We propose interval-window based algorithm for consistently relevant top-k query on dynamic data.We define the consistently relevant and the problem of consistently relevant top-k query.Then we propose the unit-time interval based algorithm that divide the problem space in the time dimension to unit-time intervals.Then the algorithm finds top-k results in each unit-time windows and aggregates results to find the top-k consistently relevant results based on the problem definition.However,the unit-time interval based algorithm has a poor scalability.We further propose interval-window based algorithm which improves the scalability in two aspects.Firstly the algorithm reduces the size of problem space in the time dimension by dynamically maintaining the time windows.Secondly the algorithm merges the common parts of computing the lower bounds in each time window to reduce the unnecessary repeated operations.The experiment result shows that the interval-window based algorithm achieves better performance in both time and space aspects than the baseline method in all test aspects,uses less memory and over 50%less querying latency.
Keywords/Search Tags:Top-k Query, Graph Query, Distributed Query, Temporal Query
PDF Full Text Request
Related items