Font Size: a A A

Research On Resource Optimization Of Distributed Cloud Platform

Posted on:2020-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:S M NingFull Text:PDF
GTID:2428330599475641Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the mobile internet and computer technology,human beings have gradually stepped into an information society supported by huge data.Therefore,in recent years,the storage and calculation of big data has become one of the major concerns in academic community and industry.In particular,the cloud computing ecosystem components represented by Hadoop and Spark,are widely applied to thousands of business scenarios.However,with the expansion of data center scale,the cost of operation and maintenance and resources has been increasing.The resource optimization problem of large clusters while pursuing the computing performance has been received much attention.Based on the above background,this thesis focuses on the resource consumption optimization in Hadoop and Spark.The specific work mainly includes the following four aspects.1.A resource consumption prediction model for Hadoop2.0 is proposed.The model construction mainly consists of three steps: Firstly,a single task model is built by fitting the relationship between resource consumption and data volume based on MapReduce process simulation and benchmark log analysis.Secondly,the Pearson Hypothesis Test method is used to estimate the distribution of task runtime in each stage of MapReduce with parallel batch processing scene,so as to reflect the delays and other phenomena in real parallel scenarios.Finally,the task regeneration and scheduling strategy based on the average field model are proposed,and the single task model given the estimated runtime is laterally added and vertically accumulated in the time axis.This strategy replaces superposition effect with the average effect and implements the prediction of various resource indicators(CPU,memory,disk read and write,network read and write)and application runtime in Hadoop2.0 cluster.At the same time,the validity of the model is verified in the real production cluster.2.A benchmark cost model for the generalized cloud computing environment is designed.Firstly,the calculation method of cost consumption for each resource indicator with unit time is defined,so as to eliminate the dimension difference between resource instances.Then,a time-based integration method is introduced to calculate the overall application resource consumption cost.This model contributes to the computational integration of resource consumption and can be coupled to any cost optimization algorithm in a pluggable manner.3.A cost optimization of resource consumption based on data persistence for Spark is constructed.Firstly,the Spark computation flow is instantiated as a directed acyclic graph,and the directed edges are weighted to define a cost optimization problem on them.Next,the cost optimization strategy is designed.A local pre-optimization strategy is proposed by directly comparing the computing resource cost and the storage resource cost.Considering the characteristics of data dependency in Spark chain computing,the predecessor data collection optimization strategy is proposed to update the resource consumption cost of the precursor RDDs.In addition,the model includes a persistent data release strategy for the single-task long-chain scenario by introducing failure rate parameters,and further optimizes the overall cost consumption.The feasibility and effectiveness of the method are verified by coarse-andfine grained experiments.4.A system integration architecture for distributed cloud platform resource optimization model is presented.The architecture includes data layer,core decision layer and application layer and has high scalability,which implements the application mode of inputting program logs to obtain a corresponding resource consumption optimization result.In addition,it avoids excessive manual intervention and meets the needs of high efficiency in industrial production environment.Two case studies and analysis show that the system has good usability and intuition in terms of usage and output forms...
Keywords/Search Tags:Hadoop, Spark, Persistence, Directed acyclic graph, Resource consumption prediction, Cost optimization
PDF Full Text Request
Related items