| With the era of data-explosion coming, how to efficiently process the TB even PB levellarge-scale data set is an urgent problem. Driven by application requirement and technology, cloudcomputing as a new computing mode is proposed and has become a main theme in the IT sector.MapReduce as a tool for processing of mass data in the cloud computing is widely used by largecommercial.However, MapReduce still also has many deficiencies, especially in schedulingmechanism such as uneven of tasks allocation and the second execution of failed tasks etc.Meanwhile, the original scheduling way is not good adapted to heterogeneous environment.So, thescheduling mechanism of MapReduce in heterogeneous environment as the main research directionwill be researched in this paper.For features of heterogeneous environment, this paper summarizes the scheduling performanceissues of MapReduce framework and the deficiencies of mainstream scheduling algorithms,especially local execution, unbalance data etc.In response to these issues, a Multiple-tasksScheduling Based on Ant Colony algorithm (MSBACO) in heterogeneous environment is presented.Through assessing the processing capacity of node and proposing a new objective function, task canbe quickly distributed to each node according to the principle of local execution. At the same time, aDecision Algorithm on Pre-failed Task (DAPT) based on MSBACO is presented. Through theprejudging of pre-failure task, failed task can be quickly transferred to other node. Based on twoalgorithms as above, a Improved MapReduce Cluster Scheduling Scheme under HeterogeneousNetwork Environment (HNE-IMCSS) is also proposed.Finally, through comparing with mainstreamalgorithms in assessment indicators such as job execution timeã€load level etc., the validation andstability of improved algorithms and scheduling model in heterogeneous environment are verified. |