Font Size: a A A

Research On Workload-aware Data Partition And Replication

Posted on:2021-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhangFull Text:PDF
GTID:2428330620468182Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the explosive growth of data and the requirements to support efficient query processing in massive data,distributed database systems and parallel data processing platforms have emerged.In a distributed environment,the execution of analytical queries can generate a large volume of data transmissions if the data partition mechanism is not consistent with that of the data required by workloads,which can slow down the processing efficiency.Data partitioning and replication is a general technology to alleviate the problem by maintaining data locality to minimize the amount of data transmitted during workload execution.It can then reduce the cost of network transmission and improve processing efficiency.Existed data partitioning and replication technologies suffer from supporting complex query workloads or applying too much excessive storage resource.This thesis designs and implements a workloadaware data partition and replication tool for distributed databases,which can help to support efficient query processing but minimize the remote data transmission.The main contributions of this article are as follows:1.This thesis designs two workload-aware heuristic partitioning algorithms for distributed data partitioning for analytic workloads in distributed environment,which effectively reduces the network data transmission between distributed nodes and improves the efficiency of query execution.2.In order to further reduce network transmission,this thesis designs resourcesensitive data replication strategy along with data partitioning,designs a mixed cost model based on network transmission costs and data storage costs,and proposes an extended partition and replication algorithm.3.The partition tool has been implemented based on the proposed partitioning and replication heuristic algorithms.It mainly includes modules such as information statistics collection,optimization algorithm,greedy partition algorithm,genetic partition algorithm,cost model construction,etc.This tool can output the corresponding partition configuration and replication strategy according to the detailed workloads.It can be used as a component embedded in the database storage layer or used as an external recommendation tool.
Keywords/Search Tags:distributed database, workload-awareness, data partitioning, data replication, heuristic algorithm
PDF Full Text Request
Related items