Font Size: a A A

Research On GraphX Data Caching Technology For Convergent Graph Analytic Application

Posted on:2021-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y DingFull Text:PDF
GTID:2518306470468234Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Graph X is the distributed graph processing system that is widely adopted in industries and academies.The convergent graph analysis is a typical graph processing application,featuring that the amount of active graph vertex tends to shrink gradually with the iterative processing goes on,and once any vertex changes from active state to inactive state,it will keep inactive to the end of the application.The existing data caching mechanism of Graph X keeps massive amounts of nolonger-involved edge data in memory during the the convergent graph analysis,which reduces the effective Graph X cache utilization.This leads to two problems.One is that,the complete active graph data can not be cached in complete with limited memory allocation and incurs extra recomputations.The other is that,the inactive edge data resides in the data cache and the surplus allocated memory resource can not be reclaimed,which results in the waste of memory resource.To solve the above problems,an optimized data caching mechanism of Graph X is proposed in this thesis for the convergent graph analysis.The essential ideas of this mechanism is to filter the inactive edge data of the application that is no longer involved in the computation,and migrate the scattered graph data to make full use of the cache space in the task executor and reallocate the surplus executors in the multiple-application environment.The main contributions of this thesis are as follows:1)Proposes GDCO,a data filtering mechanism of Graph X for convergent graph analysis.GDCO defines the edge data that is no longer used during the operation of convergent graph analysis as expired data.GDCO,based on the real-time monitoring of expired data,identifies the expired data during the application operation through the vertex-based expired data identification mechanism,and enables the data filtering when the active vertex scale is significantly reduced so as to reduce the performance overhead of the expired data filtering and ensure the active graph data to be cached completely.2)Proposes CGDM,a dynamic data migration mechanism of Graph X for convergent graph analysis.For the scenario with sufficient memory configuration,CGDM adopts particle swarm optimization and heuristically selects the data migration optimization scheme with the goal of minimizing the size of migrated graph data and balancing the scale of vertex and edge data in the data cache between nodes so as to make full use of the cache space in the task executor and release the extra task executors to other applications in Graph X system.3)Implements GDCO and CGDM in Graph X system and evaluate these mechanisms with representative graph processing benchmark.The experimental results show that the GDCO mechanism could reduce the execution time of typical convergent graph analysis on Graph X by max 88.53% in the situation of limited memory allocation,and the CGDM could lower the cumulative memory usage of typical convergent graph analysis by maximal of 56.99% in the scenario with sufficient memory allocation,and reduce the average turnaround time of the application by maximal of 14.3% in multipleapplication environments.
Keywords/Search Tags:Graph Processing, GraphX, Cache Optimiazation, Data filtering, Data Migration
PDF Full Text Request
Related items