Font Size: a A A

Performance-and Cost-efficient Graph Processing In Geo-distributed Datacenter

Posted on:2020-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:B K ShenFull Text:PDF
GTID:2428330590978654Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of virtualization technology,the construction of cloud computing platform based on virtualization technology is now more and more perfect.More and more people use the cloud platform to complete task deployment.Due to the globalization of users,the trend of regionalization has become more prominent.In recent years,infrastructure providers have been more inclined to deploy data centers in multiple countries and regions.Graph Processing is an emerging computation model for a wide range of application for a wide range of application and graph partitioning is important for optimizing the cost and performance of graph processing jobs.Recently,With the rise of mobile Internet and social networking sites,many service providers have changed the server's self-sufficient operating model by renting data center virtual machines for infrastructure providers distributed across multiple regions,many graph applications such as social networks store their data on geo-distributed datacenters(DCs)to provide services worldwide with low latency and high quality.This raises new challenges to existing graph partitioning methods,due to the heterogeneous graph traffic and the multi-level network heterogeneities in geo-distributed DCs.In this thesis,we first rent four types of machine instance in four different regions of Microsoft Azure Cloud platform.Each machine instance is tested for a week,then we study the price characteristic and the bandwidth characteristic.And we propose a geo-aware graph partitioning method named Geo-Cut,which aims at minimizing the inter-DC data transfer time of graph processing jobs while satisfying the budget constraint on inter-DC data communication cost.Geo-Cut adopts two optimization stages.First,we propose a cost-aware streaming heuristic and utilize the one-pass streaming graph partitioning method to quickly assign edges to different DCs while minimizing inter-DC data communication cost with low vertices replication.Second,we propose two partition refinement heuristics which identify the performance bottlenecks of geo-distributed graph processing and refine the partitioning result obtained in the first stage to reduce the inter-DC data transfer time while satisfying the budget constraint.Geo-Cut can be also applied to partition dynamic graphs thanks to its light-weight runtime overhead.Finally,we evaluate the effectiveness and efficiency of Geo-cut using real-world graphs with both real geo-distributed DCs and simulations.We compared the experiments with the other four algorithms and performed the experimental results on two aspects,data communication time and data communication cost.Evaluation results show that Geo-Cut has a better result than other algorithms,it can reduce the inter-DC data transfer time by up to 78% and reduce the monetary cost by up to 70% compared to state-of –art graph partitioning methods with a low overhead.
Keywords/Search Tags:Azure Cloud, Graph Processing, Wide Area Network, Geo-distributed Datacenters
PDF Full Text Request
Related items