Font Size: a A A

Multi-Copy Data Placement For Distributed Data Centers

Posted on:2020-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2428330575496982Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Large-scale data-intensive applications serve end users by routing service requests to data centers distributed in different geographic locations.When placing massive amounts of data in the data centers,each data often has multiple copies,such that service providers have to run a large number of servers to store the data copies,resulting in huge electricity bills.Meanwhile,multiple data copies placed in different data centers need to be synchronized,to ensure the data consistency,through the network between data centers,causing high network transmission cost.Therefore,the thesis investigates the multicopies data placement for distributed data centers to achieve good data access latency performance and reduce the operational costs of data storage and synchronization,namely electricity and network communication costs.The main research contents are as follows:(1)Multi-copies data placement optimization for improved power consumption and network synchronization cost.When providing services to users,service providers must minimize the data placement cost,namely power consumption and network synchronization cost,while meeting the data access latency requirements of the users.Assuming that each data has K copies,we formulate the data placement problem with the objective of minimizing the data placement cost,and propose an efficient data placement algorithm LCDP(Latency-aware and operational Cost minimization Data Placement).The algorithm partitions the data into multiple data groups,and divides the data centers into multiple subsets according to the data access delay requirements.Each subset contains K data centers.Each data in each data group is placed into a data center subset that can minimize the placement cost while satisfying the user access latency requirements.Finally,we assign users accessing data to the appropriate data center for load balance between data centers.Simulation results show that the algorithm can effectively reduce the power consumption cost and network synchronization cost of the data center.(2)Multi-copies data placement optimization for SLA violation penalty and network synchronization cost.For service providers,response latency is an important service metric to provide services to end users.If the latency is greater than the SLA requirement of the user accessing the data,the SLA violation penalty cost will be incurred.We take SLA violation penalty cost and network synchronization cost between data centers into account,and propose a data placement algorithm K-CDP(K-level and Cluster based Data Placement)with the objective of minimizing the total cost of SLA violation penalty and the network synchronization.The algorithm clusters the data by solving the linear programming problem corresponding to the delay penalty cost generated by placing a copy of data,and all the data in each cluster are placed as a whole in turn.For each cluster,we select a data center that minimizes the placement cost increase each time,until the data copies in each cluster are placed into K different data center.The simulation results show that the proposed algorithm can effectively reduce the latency penalty cost and network synchronization cost.
Keywords/Search Tags:Latency, Power Consumption, Network Transport, Data Placement, Data Center
PDF Full Text Request
Related items