Font Size: a A A

Study On The Robust Optimization Of HADOOP Under The Restriction Of Cluster Computing Efficiency

Posted on:2015-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z ChenFull Text:PDF
GTID:2298330431486352Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous development of science and technology and the amount ofdata increase rapidly, cloud storage and cloud computing is the development trend offuture. Data processing using traditional database is getting more and more difficultto satisfy the requirement of individual and enterprise users. For big data, Hadoop isthe most representative for big data storage and distributed processing system.Hadoop is developed rapidly in recent years, it’s an open source software frameworkwhich has the reliability, efficiency and scalability for processing big data in adistributed way.Due to the preliminary design of Hadoop, it’s assuming that all the machines inthe cluster are homogeneous. But in reality, Hadoop cluster is consists of manycheap machines which lead to the diversity of nodes’ computing ability and theproblem of node failure easily. Although Hadoop maintains multiple data copies forprevent the failure of computing tasks and data storage to improve the fault toleranceand reliability of the cluster. But there still need to be completed and improved innode failure predicting and data copy placement and task scheduling.In order to improve the robustness of Hadoop cluster based on the differentperformance of node to execute a task under the difference of efficiency in cluster.This paper has optimized it and the main content of the study are as follows:(1)we has present a Hadoop node failure prediction model aimed at nodemaybe failure which may not be considered in the future when choosing a task nodeand placing a data copy. Through the node failure rate prediction in the cluster, wecould predict the failure rate of a node.(2)Through the node failure prediction model, we have optimized the Hadooptask scheduling strategy and present node selection strategy about the placementalgorithm of data. It has solved the default algorithm which doesn’t consider theproblem of different computing ability which is caused by the node heterogeneityand improved the robustness of the cluster.(3)Node which has executed fewer task and be judged with higher failure ratethrough the node failure prediction model, we establish a dormancy mechanism andsolve the problem of disposal of such kind of node. (4)we validate the effectiveness of node failure prediction model under therestriction of cluster computing efficiency through constructing the Hadoop cluster.The method presented in this paper has improved the robustness of Hadoop cluster.
Keywords/Search Tags:Hadoop, failure rate prediction, node selection, data placement
PDF Full Text Request
Related items