| As the Internet scale keeps growing up, enormous user’s data needs to be processed andstorage. Traditional server cluster can not meet the needs of users.Cloud computing is nowbecoming a leading example solution for this.It provides users with massive data processing, massdata storage, on-demand access to computing power and other services.After the concept of cloudcomputing is introduced, it is widely concerned by academia and industry.Many companies havelaunched their own cloud computing platform.Among them, most cloud computing platform isdeveloped by Hadoop.Hadoop is an open source distributed framework of cloud computing which isused for writing applications that rapidly process vast amounts of data in parallel on large clusters ofcompute nodes.And storge the massive data of user.In Hadoop,the underlying parallelism istransparent to the application developers, application developers only need to follow therequirements of the interface to implement code.However, Hadoop is a relatively new platform,thereare many points need to be improved.The performance of Hadoop system closely ties to its job scheduler.Select the appropriatescheduling algorithm has a significant impact on resource utilization and system throughputrate.However, Hadoop existing scheduling algorithms have many shortcomings, therefore,through the research of Hadooop existing scheduling algorithms,we can find way to optimize andimprove these scheduling algorithms. which has significance meaning on improving Hadoopplatform’s performance and system throughput rate.This paper completed the following research:1.We Introduced the cloud computing technology,Analyzes the technical background andcomposition of Hadoop platform.Then we give a detailed analysis of the HDFS file system andMapReduce programming framework.2.Analyzes the job scheduling processes in Hadoop platform,Introduces several existing jobscheduling algorithms:FIFO scheduling algorithm, Fair scheduling algorithm,capacity schedulingalgorithm,LATE scheduling algorithm.And we Analyzes the main idea and the advantages anddisadvantages of these scheduling algorithm.3.As the existing scheduling algorithms not suited to heterogeneous environments,In this paper,we propose a new scheduling algorithm, which uses system information such as estimated jobarrival rates and mean job execution times to make scheduling decisions. And we proposeOptimization approach to find an appropriate matching of jobs and resources.4.As the existing scheduling algorithm does not consider the work workload types and nodesworkload types,In this paper, we propose a new resource aware scheduling algorithm.With thisalgorithm we can classify the type of work and node workload reasonably.And use this information to make scheduling decisions.5.In order to verify the performance of our algorithms,We built a Hadoop cluster.Theexperiments and performance analysis of the algorithm proved that he algorithm could significantlyimprove the system’s throughput. |