Font Size: a A A

Research On Query Processing On Large Scale Graph Database

Posted on:2019-10-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:B Q LvFull Text:PDF
GTID:1368330563955314Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,graph query processing has attracted much attention due to the increasing popularity of graph data in various application domains.Graph query pro-cessing is mainly conducted on two types of graph databases:the first one is large graphs such as social networks,and the second one is transaction graph databases that consist of a set of relatively smaller graphs,such as chemical compound databases.In this dis-sertation,we focus on the techniques for fundamental graph query processing problems on these two types of graph databases.By reviewing the current research works,we formulate several new research problems and propose novel algorithms to improve the efficiency and scalability of query processing.The main contributions of the dissertation are listed as below:1.The Maximum Edge Biclique(MEB)problem on large scale bipartite graphs.The MEB problem is a fundamental problem in large graphs that is widely applied in many applications,such as risk management,machine learning,biology and so on.Giv?en a bipartite graph G =(U?V,E),a biclique C=(L ? R)is a subgraph of G induced by L and R,where L(?)U,R(?)V,and(?)u ? L,(?)v R,(u,v)? E.The maximum edge biclique on G is the one with the largest number of edges among all bicliques.S-ince the MEB problem is NP complete,the existing solutions are neither efficient nor scalable on large bipartite graphs.In this work,we propose a layer-by-layer approach,that divides the MEB problem into several sub-problems,and the maximum edge bi-clique is obtained by progressively refined in all layers of sub-problems.In each layer,we set constraints on vertex partition sizes of the biclique,to guarantee that the output bicliques,if existed,must be larger than the biclique obtained so far,so as to save unnec-essary computations.We also introduce two heuristics to condense the bipartite graph before searching for bicliques,which saves enormous search space.Furthermore,we propose a greedy strategy to obtain a large biclique in the initiation stage,where a good initiation can not only be used to prune search branches in biclique expansion,but also help improve the constraints in sub-problems.We conduct extensive experiments on large real bipartite graphs,and the results show that our approach is orders of magnitude faster than the state-of-the-art methods.2.The supergraph search problem on transaction databases.Supergraph search is a fundamental problem in graph databases that is widely applied in many application scenarios.Given a graph database and a query-graph,supergraph search retrieves all data-graphs contained in the query-graph from the graph database.Most existing solu-tions for supergraph search follow the pruning-and-verification framework,but are not scalable to handle large-sized data-graphs and query-graphs.In this work,we propose new indexing and query processing algorithms.In indexing,we select features directly from the data-graphs without expensive frequent subgraph mining.The features form a feature-tree that contains all-sized features and both the cost sharing and pruning power of the features are considered.In query processing,we propose a new algorithm,where the order to process features is query-dependent by considering both the cost sharing and the pruning power.We further explore two optimization strategies to further improve the algorithm efficiency.The first strategy applies a lightweight graph compression technique and the second strategy optimizes the inclusion of answers.Finally,we con-duct extensive performance studies on two real large datasets to demonstrate the high scalability of our algorithms.3.The approximate supergraph search on dynamic transaction databases.Trans-action database consists of a collection of graphs.To accelerate supergraph search on graph databases,existing solutions always process queries based on index.However,most of them do not adapt or scale in dynamic graph database,in which graphs are fre-quently inserted,deleted and/or modified over time.Users can re-mine indexes on the updated graph database,but it is time consuming.Besides,as graph database scales up when large amount of new graphs inserted over time,exact query processing may face the efficiency problem.Based on these two observations,we propose approximate supergraph search on dynamic transaction databases.We introduce how to efficient-ly maintain the index incrementally when the graph database is updated dynamically.When updating index,the features are selected with both the cost sharing and pruning power considered.Moreover,we propose an approximation approach to significantly reduce the computational cost for large data-graphs and/or query-graphs while preserv-ing a high result quality.Finally,we conduct extensive performance studies on two real large datasets to demonstrate the efficiency and scalability of our algorithms.
Keywords/Search Tags:Query processing, Maximum edge biclique, Supergraph search
PDF Full Text Request
Related items