Font Size: a A A

Extension Of Hadoop Framework And Performance Tuning

Posted on:2013-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2248330362472761Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cloud computing as a new concept become to be a hot topic in2007and developrapidly in recent years. There are many similarities in cloud computing, distributedcomputing and grid computing on the view of computing model. Cloud computingdevelops on the basic of the distributed computing and grid computing from studyingthe backgroud of it. At an earlier time distributed computing and grid computing areused for scientific research, but recently with the rapid development of the Internet,distributed computing and grid computing are evolved to a more appropriatecommercial computing model which is called cloud computing.The paper first introduced the background knowledge of cloud computing and gridcomputing, analyzed the difference between the them, reserched the key technologies ofthe cloud computing platform Hadoop which is component of the MapReduce, HDFS(Hadoop Distributed the File System). And then introduced the the LSF (Load SharingFacility) system architecture, which including the LSF base and LSF batch, analyzedthe LSF job execution processes and systems load balance.After the in-depth research and analysis on Hadoop system, the three inadequate ofHadoop in the face of enterprise applications are found out, which are the single pointof failure, a single scheduling algorithm and heterogeneous platform compatibility. Forthe lack of these points, the paper associate integrated the Hadoop system and LSFsystem, and then a new systerm LSH is built. There are two points of integration on thesystems integration. First, the LSF job control mechanism LIM (Load Information Manager), RES (Remote Execution Server) and SBD (sbatch a daemon) is added to theHadoop system HDFS layer and MapReduce layer; Second, the LSF master node andHDFS NameNode can share information through an open interface between them. Thesysterm integrated can effectively prevent the Hadoop system single point of failureproblem, and solve the issues of a single scheduling algorithm and Hadoopcompatibility issues on heterogeneous platforms.Finally, design different experimentals for the integrated system LSH and Hadoop,respectively to verify the performance of the two systems when namenode has a singlepoint of failure or cluster is a heterogeneous platform, the results prove that the LSHfully compensate the lack of Hadoop, and LSH is able to adapt to enterprise-classapplications.
Keywords/Search Tags:Cloud Computing, Grid Computing, LSF, Hadoop, MapReduce
PDF Full Text Request
Related items