Key Algorithms And Application Research Of Subgraph Cohesiveness Analysis And Processing In Large-Scale Property Graphs

Posted on:2024-02-07

Degree:Doctor

Type:Dissertation

Country:China

Candidate:P Y Lin

Full Text:PDF

GTID:1528307334478204

Subject:Computer Science and Technology

Abstract/Summary:

Graphs are an important data structure widely used in various fields.However,as the need for richer applications grows,simple graphs are no longer sufficient,and property graphs have emerged as the most popular graph data model in recent years.The study of subgraph cohesiveness,which involves analyzing and processing closely connected vertices,is a crucial research area in property graphs.It plays a key role in recommender systems and distributed parallel graph computing systems.Research on subgraph cohesiveness can be categorized into two main types:local and global subgraph cohesiveness analysis and processing.Cohesive subgraph query research and graph partition research fall under these two categories,respectively.Despite the increasing complexity of big data and applications,the research on subgraph cohesiveness analysis and processing continues to face significant challenges at large scales.Cohesive subgraph query poses more complex processing challenges due to its indi-vidual requirements.This type of query aims to identify the connected subgraph formed by a closely related vertex set S of a given query vertex q or query vertices set in a given query graph G.By combining the position and semantic attributes of vertices,the query results can be more concise and interpretable,which can enhance the accuracy of recommenda-tion systems.However,existing cohesive subgraph query research falls short in describing the attributes of complex scenes.As a result,relevant cohesive subgraph models and algo-rithms are still in their early stages of development.Graph partitioning involves dividing a given large-scale graph G into specified k subgraphs{S G₁,...,S G_k}with high internal vertex cohesiveness while satisfying other parameter requirements.Under the new appli-cation requirements,graph partitioning faces the dual challenges of large scale and com-plexity.For example,many parallel processing tasks are modeled as directed acyclic graphs（DAGs）,and the subgraphs composed of cohesive vertices are assigned to the same proces-sor to minimize the overall execution time of the task.This is known as DAG scheduling.However,many existing DAG parallel processing tasks are massive in scale.Using classic DAG scheduling methods results in high time complexity.Similarly,using classic graph partitioning algorithms with the objective of minimizing the number of cut edges fails to meet the requirements of minimizing the execution time of parallel tasks.Moreover,in the research on DAG scheduling for large-scale deep neural network（DNN）inference and parallel processing tasks,there has been a lack of in-depth analysis of the DNN model’s execution characteristics on processors,resulting in unsatisfactory scheduling results and low processor utilization rates.To address the challenges in existing large-scale attribute graph subgraph cohesiveness research,this paper delves into the topic and presents new models and algorithms.The main contributions and innovations of this paper can be summarized as follows:1)Large-scale cohesive subgraph queries based on semantic attributed graphs.To address the limitations of complex attribute modeling and processing in the existing at-tribute graph-based cohesive subgraph query research,this study proposes a new semantic graph-based attribute graph modeling method,the corresponding cohesive subgraph query problem modeling,and the design and experiment of the corresponding query algorithm.The proposed graph data model describes vertex attributes using directed acyclic seman-tic graphs,which offer greater accuracy,universality,and flexibility compared to existing methods based on keywords and profile trees.In addition,this study is based on the co-hesive subgraph query problem（CSSAG）based on semantic attribute graphs constructed by k-core.Through experimental comparison verification and case analysis,it is proved that the cohesive subgraph model in this study has good performance,i.e.,the attributes and topological are closeness,query results are more interpretable.To effectively obtain query results,three online query algorithms,namely the level-by-level expansion search al-gorithm,the branch backtracking expansion search algorithm,and the loss function-based greedy search algorithm,are proposed in this study along with their corresponding pruning strategies.Experimental results have shown that the proposed methods can efficiently ob-tain query results.Specifically,for a graph with a scale of millions,the query results can be obtained in a matter of seconds.2)Large-scale DAG partitioning that minimizes task scheduling time.In many paral-lel acceleration scenarios,the traditional graph partitioning method,which divides vertices with cohesive connecting edges in the same subgraph by reducing the number of connecting edges between subgraphs,cannot meet the requirements for minimizing the execution time.In the graph partition that minimizes the execution time,the graph is represented by a DAG with attributes,and the weight attributes of vertices and edges represent the execution time required by vertices and the communication time between vertices,respectively.Although there are many researches on DAG scheduling,these are typically aimed at small DAGs with high algorithm complexity,which makes it difficult to effectively deal with large-scale DAG partitions.In order to efficiently obtain a feasible solution with the shortest execution time as the goal in large-scale DAG partitioning,that is,the vertices that make the task execution time cohesive are divided into the same subgraph,a heuristic multi-level partition method based on path analysis with the goal of reducing critical path partition is designed in this study.In addition,in the experiment,the method proposed in this study is first verified on a simulated large-scale DAG graph,and then experimented in two scenarios of large-scale circuit hardware acceleration verification and large-scale DNN model distributed inference.The experimental results show that the method proposed in this study can reduce the critical path length by 11.35%and 22%compared with Metis in circuit and DNN data,which is beneficial to the execution of subsequent scheduling.3)Acceleration of DNN inference based on large-scale DAG scheduling in multi-GPU.This study has observed that in the DAG partitioning of the distributed inference of the large-scale DNN model in the second study of this paper,the execution of DNN in the GPU is simply abstracted as serial execution,and the parallel capability of GPU multi-stream is underutilized.Therefore,in this study,the execution of DNN model inference in GPU is firstly analyzed in depth,and two key features are found when multi-stream parallel execution is based on GPU:1)When multi-stream parallel execution of DNN model,the start-up time of each stream cannot be ignored,because its magnitude is comparable to the inference execution time of many DNN model operators（such as Sigmod）;2)A single stream execution of DNN model operators cannot fully utilize all computing resources of the GPU.Based on the above two characteristics and the stability of the DNN model in the GPU based on stream and cu DNN execution memory and time,this study models the DNN model as an attribute DAG and proposes the DeOrd scheduling algorithm for the inference acceleration of the DNN model in the GPU.Due to the memory limitation of the large DNN model,this study designed a cooperative DeOrd scheduling algorithm to distribute the DNN model operation operators among multiple GPUs and their multi-streams,which simplifies the order of first assigning GPUs and then scheduling operation operators in the second research point of this paper.Finally,the experiments on the real DNN model show that the method proposed in this study can improve the inference response time by an average of1.286 times on a single GPU,and by an average of 1.313 times on multiple GPUs.

Keywords/Search Tags:

Property Graph, Subgraph Cohesiveness, Cohesive Subgraph Query, Graph Partitioning, DAG Scheduling, GPU

Related items

1	Research On Distributed Subgraph Matching Algorithm For Large Scale Graph Data
2	Research On Exact Subgraph Search Technology In Graph Database
3	Research On Frequent Subgraph-based Graph Query Techniques
4	Research Of Query Technologies On Graph Data
5	Research On Algorithms For Subgraph Query Based On Graph Databases
6	Research On Subgraph Search Technology In Graph Database
7	Research On Key Techniques Of Query Processing And Mining Analysis Over Large Graph Data
8	Research On Techniques For Query Processing On Graph Databases
9	Design And Implementation Of Efficient Subgraph Matching Algorithm On Large
10	Research On Subgraph Query Method For Large-scale Dynamic And Directed Label Graph