Font Size: a A A

The Research Of Scheduling Algorithms For Performance And Energy Consumption Under The Condition Of Data Skew

Posted on:2017-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:L QiFull Text:PDF
GTID:2428330488979845Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the vigorous development of Internet technology,information technology has gradually penetrated into all walks of life,is closely connected with human life.The great increasing of Internet users number,directly led to the explosive growth of the huge amounts of data,makes the distributed computing and cloud computing have the platform to development.How to use more efficient computing framework to mining useful information from huge amounts of data and promote the development of the enterprise,is the long-term research subject for enterprises.As the hottest member in cloud computing platform,MapReduce has got the attention of the companies and research institutions.MapReduce is a distributed computing framework,put forward in 2006 by the Google,after nearly 10 years evolution,it has become more and more perfect.Its simple programming models which makes users only need to write a simple Map function or Reduce function to realize their own needs,and regardless of the underlying fault-tolerant,redundancy,node communication and other complex problems.But with huge amounts of data distributed unbalanced,the Hash allocation strategy in MapReduce cannot meet the needs of users.Because the Hash algorithm assign tasks to Reducer according to the<key,value>pairs,this strategy will lead to data skew in Reducer,and makes the system in many of straggler tasks.This paper presents CSRA,an efficient resource allocation algorithm,aims at reducing the running time and coefficient of variation by reordering the task list and splitting the big clusters.Through thinking over the actual status of tasks,this method largely squares up the resource utilization.After we implement CSRA in Hadoop platform,the experiments show that CSRA has negligible overhead and can speed up the execution time of some popular applications obviously.Based on CSRA algorithm,this paper also studied the energy consumption problem in parallel and distributed systems.Heterogeneous distributed system with its low cost,good scalability and fault tolerance characteristics,has led many companies to build their distributed system based on this platform.With the growing development of Internet companies,the number of data center growing like mushrooms,how to efficient manage the resources utilization in data center,scheduling tasks reasonable,have become a problem urgently to be solved in green computing.This paper puts forward the energy saving task scheduling algorithm based on DVFS technology,DEWTS,this algorithm first based on the heuristic task time estimation method proposed in CSRA algorithm to estimate the task execution time,determine the priority of tasks,and reasonable scheduling tasks,after the completion of the scheduling,according to the number of tasks on the processor and the processor resource utilization determine which processor to closed,then reuse CSRA algorithm to scheduling tasks.After the completion of the scheduling,use DVFS technology to adjust the voltage/frequency of the processors.The experimental results show that DEWTS algorithm can guarantee the task completion time and reduce the total energy consumption of the system.
Keywords/Search Tags:MapReduce, data skew, energy scheduling, dynamic voltage and frequency scaling, cloud computing
PDF Full Text Request
Related items