Font Size: a A A

Job Scheduling Technologies In Data Intensive Supercomputing Systems

Posted on:2012-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ChenFull Text:PDF
GTID:2218330362460363Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
According to the expirence of industry production design, and the analysis of the faulty of supercomputing, academic circle comes up with Data-Intensive Supercomputing, a new parallel approaching dealing with large scale data. There are two features about Data-Intensive Supercomputing, first, the time of computing is proportional to the scale of data and second, send computation to data, not data to computation, also named as data locality. Data cluster built to incarnate Data-Intensive Supercomputing can offer service as the"cloud"in cloud computing.One of the prototype of Data-Intensive Supercomputing proposed by academic is Google MapReduce, then comes Hadoop, designed by open source organization, based on Google MapReduce. After then, people have done plenty of research on job scheduling in Hadoop cluster, the main goal is to solve the Straggler problem, which means some nodes have significantly long running time than other nodes. It is complex about the reason of Straggler, which may be wreaked by faulty in machines or networks, or dataset partition.We maintain that the imbalance of partitioning data over low key entropy space is a non-trivial reason for Straggler. Up to now theres is no ideal solutions. In this paper, we propose a runtime load balance mechanism to balance computing when a job is running; to lower the probability of Straggler comes up. Based on this mechanism, in order to reduce the whole running time of a job, we developed a data locality enhancement mechanism, according to the principle of data locality. We implement a prototype over iterative Hadoop, also known as HaLoop, and evaluate each mechanism. The expirement shows that the runtime load balance mechanism can balance computing effectively, and data locality enhancement mechanism can reduce job running time significantly in a friendly circumstance.
Keywords/Search Tags:Data-Intensive, Supercomputing, Cloud Computing, MapReduce, Hadoop
PDF Full Text Request
Related items