Font Size: a A A

Research On Key Techniques Of Query Processing Over Large-scale Graph Data

Posted on:2018-12-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y R ChengFull Text:PDF
GTID:1368330572465457Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Internet and Databases,graphs,as a general data structure,have been widely used in many real applications,such as biological networks,social networks,and knowledge graphs,etc.Queries over graph data,such as shortest path,reachability,and keyword search,are the most fundamental problems in the field of database.Especially in the big data environment nowadays,how to efficiently process different queries over large-scale graphs is becoming more and more important.Although researchers have made great contributions on the techniques of graph queries,when graphs are applied in different applications,they are combined with a variety of information,such as uncertainty,spatial-temporal information,etc.In order to deal with different requests in different environments,queries over graph data need to be modeled more reasonably.On the other hand,as graphs inherently have complex topologic information,the computational complexity of graph queries is usually very high.Thus,through analyzing real applications' requests,this dissertation sets up more reasonable models,and proposes specific algorithms to efficiently answer queries over large-scale graph data.(1)Shortest path query over large correlated uncertain graphs.It analyzes that in the real applications,the uncertain information in graph data usually has correlations.Thus,this dissertation proposes a correlated uncertain graph model based on Markov Networks,which can overcome the shortcomings of the existing independent uncertain graph model.Since computing the shortest path probability over correlated uncertain graphs is#P-hard,this dissertation proposes a filtering-and-verification framework to solve this problem.In the filtering phase,this dissertation calculates a series of upper bounds of the shortest path probability,and designs an index to help to prune the vertices and edges that have no contributions to the final results.As constructing the optimal index is still NP-hard,this dissertation provides an O(logn)-approximation algorithm that has a polynomial time complexity to construct the index.After filtering,only a small subgraph is left as candidate.In the verification phase,an efficient sampling algorithm is designed to get the final result over the candidate.(2)Reachability query over distributed uncertain graphs.It analyzes that graphs in real applications are usually distributedly stored.However,existing techniques for distributed uncertain graph queries are centralized ones.Moreover,since this problem is#P-complete,even calculating small graphs in centralized environment is time costly.This dissertation finds that although calculating the reachability probability is#P-complete,the calculation over a large subgraph are polynomial.Thus,this dissertation proposes a distributed reduction-and-consolidation method to solve this problem.In the distributed reduction step,all subgraphs whose reachability probability can be calculated in polynomial time are reduced into an edge.In distributed consolidation step,this problem is changed into an efficient table join problem,and an efficient approximation algorithm is proposed to solve the corresponding problem.(3)Keyword query over large error-tolerant knowledge base.It analyzes that in real applications,knowledge bases are usually error-tolerant.However,directly applying current keword query definitions into knowledge bases may result in confusing results.Thus,this dissertation defines a new "confident-r clique" query for knowledge base environment,such that it can return a more reasonable answer.On the other hand,since calculating the confidence of r clique is#P-hard,a filtering-and-verification method is designed.The filtering phase calculates the candidate structure and confidence upper bound of confident-r clique,so as to prune the useless vertices and edges.In the verification phase,an efficient sampling algorithm with accuracy guarantees is designed.Extensive experiments over real datasets prove that the proposed confident-r clique query can provide a more satisfactory answer,and the proposed algorithms are effective and efficient.(4)Event-participant planning query over event based social networks.It considers the situation that the bipartite graph matching problem combined with spatial-temporal information in the real world,and proposes an event-participant planning query for the users in an event based social network platform.Unfortunately,existing techniques either ignore the lower bound of participants of each event,or assume that once an event is published,nothing can be changed,which is not practical in the real world.Thus,this dissertation proposes a problem of global event-participant planning with constraints and its incremental variant,and proves that these problems are NP-hard.To solve these problems,this dissertation design approximate algorithms with approximate ratio guarantees.Extensive experiments over real datasets verify the effectiveness and efficiency of the proposed algorithms.The studies in this dissertation can solve different requests of users in different applications,and provide a new view of graph data processing for researchers.
Keywords/Search Tags:large-scale graph data, query processing, shortest path query, reachability query, event-participant planning query
PDF Full Text Request
Related items