Font Size: a A A

Accelerated Prototype Of Distributed Big Data Based On Divisible Load Scheduling

Posted on:2021-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:W N DaiFull Text:PDF
GTID:2518306017955199Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The vigorous development of science and technology has brought about the blooming of application types,but it has also led to the explosive growth of data scale.How to deal with big data has become a key issue that needs to be solved urgently in the information age.Due to the fact that traditional centralized processing systems are becoming more and more inadequate in the of new data with high con-currency and large flow,distributed computing systems have gradually replaced it as the mainstream big data processing method.Therefore,this paper proposes a distributed big data acceleration prototype based on divisible load scheduling to solve the problems of network bandwidth and computing resource allocation in a distributed processing system under arbitrarily network topology.This paper is mainly divided into two parts.The first part is the theoretical research and method demonstration stage.First and foremost,by analyzing the advantages and disadvantages of mainstream distributed processing frameworks,combined with divisible load scheduling(a linear and continuous modeling of load involving divisible computing and communication),two parallel processing models based on arbitrary network topology are proposed.According to the differences in processor processing methods in the model,it is divided into Parallel Synchronous Processing Model(PAP)and Parallel Asynchronous Processing Model(PSP).Then,each model is mathematically modeled to formulate the problem,the dynamic sequence method is used to divide the model to get its sub-problems for analysis,and the formula is derived to transform the two models from non-optimization problems to optimization problems,and using constraints to find the optimal solution of the sub-problems within its effective domain and extend it to the entire model.Finally,in the experimental demonstration stage,the PAP model and the PSP model are compared with the most advanced algorithms in DLS field through simulation experiments,and the processing completion time is used as scalar of the model performance.The two parallel processing models have more stable and excellent performance,and they are insensitive to the coupling between nodes and more suitable for large-scale network environments.The second part is the actual application stage.First of all,the above part of the work is used as a theoretical basis.In actual projects,Docker container technology is used to build a distributed cluster,and SDN technology is used to complete the construction of the network topology between clusters to achieve communication,load transmission,and control between nodes.Then,using virtualization technology to perform parallel computing tasks on the deployed distributed cluster to verify the impact of the number of processors on its performance.Experimental results prove that,when in a single-host environment,the performance of the cluster will increase as the number of processors increases.Finally,the distributed acceleration prototype is applied in the distributed machine learning framework,and the training parameters are optimized under the condition of multi-machine and multi-card combined with divisible load scheduling algorithm,and compared with the single-machine and single-card training.Analysis of the experimental results shows that distributed machine learning with accelerated prototypes has improved training speed,and predicts that the efficiency of delayed training will further increase when the training data scale continues to increase far beyond communication.
Keywords/Search Tags:Dividable Load Scheduling, Optimization Theory, Big Data Processing, Parallel Computing, Arbitrary Network, Distributed Machine Learning
PDF Full Text Request
Related items