Study On The Robust Optimization Of HADOOP Under The Restriction Of Cluster Computing Efficiency

Posted on:2015-09-23

Degree:Master

Type:Thesis

Country:China

Candidate:Z Chen

Full Text:PDF

GTID:2298330431486352

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the continuous development of science and technology and the amount ofdata increase rapidly, cloud storage and cloud computing is the development trend offuture. Data processing using traditional database is getting more and more difficultto satisfy the requirement of individual and enterprise users. For big data, Hadoop isthe most representative for big data storage and distributed processing system.Hadoop is developed rapidly in recent years, it’s an open source software frameworkwhich has the reliability, efficiency and scalability for processing big data in adistributed way.Due to the preliminary design of Hadoop, it’s assuming that all the machines inthe cluster are homogeneous. But in reality, Hadoop cluster is consists of manycheap machines which lead to the diversity of nodes’ computing ability and theproblem of node failure easily. Although Hadoop maintains multiple data copies forprevent the failure of computing tasks and data storage to improve the fault toleranceand reliability of the cluster. But there still need to be completed and improved innode failure predicting and data copy placement and task scheduling.In order to improve the robustness of Hadoop cluster based on the differentperformance of node to execute a task under the difference of efficiency in cluster.This paper has optimized it and the main content of the study are as follows:（1）we has present a Hadoop node failure prediction model aimed at nodemaybe failure which may not be considered in the future when choosing a task nodeand placing a data copy. Through the node failure rate prediction in the cluster, wecould predict the failure rate of a node.（2）Through the node failure prediction model, we have optimized the Hadooptask scheduling strategy and present node selection strategy about the placementalgorithm of data. It has solved the default algorithm which doesn’t consider theproblem of different computing ability which is caused by the node heterogeneityand improved the robustness of the cluster.（3）Node which has executed fewer task and be judged with higher failure ratethrough the node failure prediction model, we establish a dormancy mechanism andsolve the problem of disposal of such kind of node. （4）we validate the effectiveness of node failure prediction model under therestriction of cluster computing efficiency through constructing the Hadoop cluster.The method presented in this paper has improved the robustness of Hadoop cluster.

Keywords/Search Tags:

Hadoop, failure rate prediction, node selection, data placement

PDF Full Text Request

Related items

1	Research On Failure Prediction Of Supercomputers Based On Online Machine Learning
2	Physics Based Research Of IC Failure Rate Prediction Method
3	Research On Hadoop Based Data Placement Strategy
4	Research On Hadoop Based Iterative Data Processing And Data Placement Strategy
5	Disk Failure Analysis And Prediction Based On Disk I/O Load
6	Research On The Rechargeable Node Placement Problem In Green Wireless Mesh Networks
7	Research On Parallelization Of Clustering Algorithm Based On Heterogeneous Hadoop Platform
8	Based On The Offline Time Series Data Of Sudden Failure Prediction
9	Selection And Prediction Of Failure Data Of Software Reliability Model And Its Implementation
10	Research On Disk Failure Prediction Method Based On Multi-dimensional Features