Font Size: a A A

Distributed Processing Framework Based On Cloud Computing Research And Design

Posted on:2012-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y B YangFull Text:PDF
GTID:2208330332986768Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of e-commerce, the data, relating to international trade, also turn out to be a boom. In order to resolve some important technological problems in international trade, this thesis designed the distributed processing framework based on cloud computing to meet the requirements of storage management of mass data, disaster recovery, quality assurance of information services, efficient business information statistics, electronic documents conversion. This design not only takes good advantage of large-scale computer cluster of storage and computing resources, but also makes it convenient to expanse the clusters with low cost. It establishes a good technical support for international trade cooperation between enterprises and information exchange.This distributed processing framework is designed with hierarchical architecture, namely, hardware resources management layer, business logic layer, user interaction layer, bottom up. The three layers are independent of each other, the lower layer provides function calls interface to the upper, at the same time, the change of layer will not impact on other layers, so the framework can easily be improved and expanded. Hardware resource management layer uses Hadoop which is an open source cloud computing framework to manage the cluster hardware resources; the Spring open source framework technology and MapReduce programming techniques are applied to business logic layer, which mainly makes use of the underlying user interface to realize user management, document management, business information statistics, e-document conversion, programming interface calls and other business functions; user interaction layer, using Struts2 architecture, coupled with a variety of Web advanced technology, is responsible for the cluster interaction with the user, according to the user requests to call the appropriate business logic layer function modules.Considering the requirement of framework for multi-user and the demand for more short jobs, this thesis have done some deeply researches with MapReduce programming model of job scheduling approach. Through a large number of experiments and study of existing job scheduling algorithm, combining with features of the project itself, this thesis designed two time-driven queue job scheduling algorithm. The algorithms divides the job queues into long job queue and short job queue, which meets the needs of the short job queue, as well as avoids a long job occupying the drawbacks of cluster resources for a long time, and to some extent, realize the parallel operation and improve the utilization of cluster resource. Set last executive deadline, the job that meets the execute time can get the highest priority to handle firstly, and can get more resources, well meeting the user's real-time needs. By delaying the task that can not be temporarily assigned to the data nodes resource for a period of time, we will achieve a better localization of data to improve the efficiency of the cluster.Finally, we had made a lot of experimental validation and comparison for the optimizations and improvements mentioned in this thesis. First confirmed that rational allocation and effective parameter can enhance the performance of the framework, and then confirmed the algorithm designed in this thesis is more superior to Hadoop default algorithm.
Keywords/Search Tags:Cloud Computing, MapReduce, Job Scheduling
PDF Full Text Request
Related items