Font Size: a A A

The Research On Optimization Methods Of High Energy-efficiency Resource Management In Cluster System For Big Data Application

Posted on:2019-09-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:S M ChenFull Text:PDF
GTID:1368330545472901Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Big data is a hot research topic in Mobile Internet,Internet of Things,Parallel Cluster Computing,Data Science and Machine Learning currently.Its technological development trend has moved from ”concept” to ”value”.How to dig out valuable knowledge from these huge body mass,and the growing multi-source heterogeneous data cost effectively is one of the core issues.It is subject to academic,industrial and application industries of widespread concern.The cluster computing system which supports big data analysis is one of important foundations.Traditional cluster computing systems(e.g,high availability computing,load balancing and high performance computing)mostly serve a single type of application.In contrast,the cluster computing systems for big data applications have such features as multiple computing model integration,heterogeneous job resource requests and SLA constraints,and multi-dimensional resource management modes.Proposing resource management strategies that based on the application,computing and service characteristics of this kind of cluster computing system,and improving the self-learning ability of resource management system,may be an effective way to realize the high energy-efficiency computing of the cluster system.Inspired by this,our paper has carried on the research from three levels: dynamic resource supply,resource efficient sharing and task energy-aware scheduling.First of all,we observe that the heterogeneous workload and the limitations of the existing on-line prediction methods are the main challenges for dynamic server provisioning in big data application cluster system by analyzing the real cluster system trace data.This part of the study proposes corresponding optimization methods to overcome these two challenges.On the one hand,we propose an adaptive pause strategy.The strategy can alleviate the problem that the low utilization server can not enter the idle state due to the long service time of big data application,which is affects the scale contraction of the cluster.On the other hand,we present a reinforcement learning based online hyper-heuristic prediction method.This method is inspired by the idea of hyper-heuristic that heuristics to generate heuristics.In this kind of hyper-heuristics,the high-level heuristic construct a high-quality heuristic based on optimization scenarios to improve the efficiency of the algorithm.The proposed algorithm can improve the prediction accuracy of the online prediction method under the scene of frequently fluctuating resource requests and changing trend of requests.Second,efficient and fair sharing of cluster resources is an effective way to improve the utilization of system resource.In a big data application cluster system that supports multidimensional resource management,the efficiency and fairness of resource sharing are equally important.This part of the study explore the importance of discrete multidimensional resource allocation in dynamic scenarios.Meanwhile,inspired by the characteristics of the cluster system that most of submit jobs are small job(i.e,jobs with small amount of work)but most of cluster resources to serve large jobs(i.e.,jobs with many tasks)during cluster computing,we propose an efficiency-aware multi-resource allocation algorithm.In the algorithm,If the selected user job is a big job,system allocate the resource on a server that expects fewest surplus resources to the user;otherwise,the first resource on the server that satisfies the user's demand is assigned to the user.Third,there is a large amount of periodic jobs(e.g,log analysis,daily data processing,big data machine learning,etc.)in big data applications.These jobs can be obtained its detailed operating information through the relevant technology,which can be used to its scheduling optimization.In this part of the study,the energy-aware task scheduling is studied based on DAG modeling.In order to realize the flexible management of job completion time and energy consumption cost,we define the problem as a multi-objective optimization problem.In addition,in order to improve the search efficiency of algorithms and reduce the computational overhead,the Memetic optimization idea that evolving individuals through specialized knowledge to improve the search efficiency of the algorithm is introduced.In our proposed algorithm,a Memetic local search algorithm that critical task for makespan optimization and non-critical task for energy consumption optimization is presented.Finally,in view of the increase of computational complexity caused by multi-objective optimization,the problem in this section is defined as energy-constrained performance optimization and performance-constrained energy optimization.In addition,the problem definition takes into account the cluster system with dynamically variable voltage.The multi-objective Memetic algorithm proposed above has limitations in the face of various DAG optimization.This part of the study introduces a selection-based hyper-heuristic optimization idea that automatically selecting an appropriate scheduling strategy for different situations to improve the search efficiency and universality of the algorithm.We propose a quantum-inspired hyper-heuristics framework and method for energy-Aware Scheduling.In addition,In addition,because in this study,the problem is defined as a constrained optimization problem,and the hyper-heuristic method has higher requirements on the constraint processing technology,we propose a fuzzy search biases guided constraint handling technique.With the further development of big data applications,high energy-efficiency computing in big data application cluster systems will become more and more important.This research explores the problem from three aspects: the supply,sharing and scheduling of the cluster system resources.Oure study deeply analyzes the computing characteristics of the big data application cluster system,introduces some advanced optimization concepts and puts forward some related methods to enhance the self-learning ability of the system resource management.This work will be a good reference for the research in this field.
Keywords/Search Tags:Big Data Application, Cluster computing, High energy-efficiency, Dynamic Server Provisioning, Multi-dimensional resource sharing, DAG Scheduling
PDF Full Text Request
Related items