Font Size: a A A

Research Of Localization Computing Strategy Based On Hadoop Platform

Posted on:2016-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:L J NingFull Text:PDF
GTID:2308330479484893Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Massive data processing is a key module of cloud computing platform which is indispensable. Hadoop is a distributed computing platform for large-scale data set in parallel. One of its core modules is the resource scheduling in the cluster. It is the key to influence the properties of Hadoop. With the explosive growth of data, data intensive computing is a common need and it should be solved urgently in Hadoop cluster. Due to the distributed data storage, Transferring data for computing has become a bottleneck in the development of Hadoop cluster. Studies have shown that, achieving localization computing or transferring computing instead of transferring data is an effective way to break through the bottleneck.In Hadoop cluster, the data is stored in different nodes, and each rack is consists of several nodes. In this paper, the localization computing is divided into the node localization computing and the rack localization computing. The following is a brief introduction to the work of completed study in this paper.①Heterogeneous resources demands on calculation is discussed, And a method of resource representation is presented. In order to distinguish the different computing demands for resources, we present a concept called dominated resource.②The DRF, the abbreviated form of the dominant resource fairness, is analyzed. DRF is different from fair scheduler and the capacity scheduler which are based on slot. And the DRF can obtain better throughput. However, it does not have the characteristics of localization computing.③The Delay scheduling strategy is researched. And a detailed analysis of the relationship between the maximum delay interval and the localization calculation is described. The task waiting time of localization computing, the average length of task time and the amount of resources between nodes are also analyzed.④According to the localization computing and the scheduling fairness, a new scheduling algorithm called the DDRF(Delay-Dominant Resource Fairness) is proposed. The DDRF not only achieves a higher degree of localization computing but also guarantees the fairness of resource scheduling. We calculate the degree of influence of localization through the experimental data analysis of DDRF algorithm on the average extension time,⑤Achieving the Saa S in cloud environment based on SOA(service-oriented architecture) architecture is researched. An application development process based on SOA is given. Combining actual problems, for example, querying integers, querying points, querying words, querying spanning trees, the serial implementation and the parallel implementation of the Top-K, the K-Means are analyzed. According to the proposed development process based on SOA, the web service of the Top-K and the K-Means by using the CXF are developed.This paper presents a scheduling algorithm which can meet the localization computing and fairness of resource. The localization degree of computing is verified by theoretical analysis and experiment. An application development process which is combined with Saa S and SOA, is proposed. Two kinds of application are selected as services component. It is able to be the reference when we build cloud computing application systems in future.
Keywords/Search Tags:Locality, Hadoop, Cloud Computing, SOA
PDF Full Text Request
Related items