Font Size: a A A

Research On MapReduce Program Based On YARN

Posted on:2018-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:G WangFull Text:PDF
GTID:2428330569975195Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In big data era,a mass of structured and semi-structured data are generated everyday,these data can not be stored and analyzed by traditional relational database.So,MapReduce framework is raised.Hadoop,a open-source implementation of MapReduce,is used widely by its high efficiency,fault-tolerance and low-cost.But,because the master in Hadoop has too much responsibilities,it is easy to cause bad scalability and reliability.And Hadoop can not support for multiple computing framework,hence Hadoop2.0 is designed as a resource management framework,YARN.YARN can support for many computing frameworks and schedule resource as Container,a fine-grained form,to get a higher resource utilization.YARN has become one of the most popular resource management framework by its high scalability,high reliability,high resource utilization and supporting many computing frameworks.The unlocal map task and reduce task of MapReduce application need to grab data from other nodes,so these tasks will occupy the network resources.But the Container is encapsulated with cpu and memory but network.So there will be unbalanced loading in network when MapReduce applications are running on YARN.Aiming at this matter,this paper proposes a balanced loading scheduling policy on network based on the network flow of the node.This policy records the real-time network flow on each node and chooses the suitable application and task to obtain the load balancing on network when NodeManger sends heartbeat to ResourceManager.To summarize,the scheduling policy this paper proposes allocate network resource more even and reduce the execution time of job and improve resource utilization.
Keywords/Search Tags:MapReduce Application, Load Balancing on Network, YARN
PDF Full Text Request
Related items