The Analysis And Reaserch Of The Storage Mechanism Of HDFS

Posted on:2015-10-29

Degree:Master

Type:Thesis

Country:China

Candidate:J H Lv

Full Text:PDF

GTID:2298330452450747

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The size of data has being growing sharply in recent years.With the result thatsuch amounts of data can not be stored in normal file system,the concept ofdistributed file system is brought up.Nowadays,many well-known enterprises useHadoop to process huge amounts of data. Hadoop is a cloud computing platform forprocessing and keeping vast amount of data.It includes its own distributed filesystem named HDFS in which there two kinds of servers,one is called namenodeused to store the namespace of the entire cluster;the other is called datanode used fordata storage. Each file in HDFS has three copies in case of a data corruptionproblem,and the file replication can be configurated by the client.This thesis will mainly analyze the storage mechanism of HDFS, and point outa fact that HDFS is not tailored to process huge amounts of small files and can causethe SPOF problem.Chapter-2first gives an introduction of the archive and sequencefile,both ofwhich are implemented in hadoop,then quest for a new solution for the procession ofsmall files and compare it with the solutions before. Multi-NameNode Cluster caneliminate the memory performance bottleneck of a single NameNode.EachNameNode in the cluster is independent of others so that one fails will not affectothers.Thesis makes an comparative experiment on Hadoop Archive andMulti-NameNode Cluster,and analyzes the experimental results.As for the SPOF problem, this thesis emphatically analyzes the QuorumJournalNode Management scheme(QJM for short) in hadoop2.x because of theremain of the SPOF problem in hadoop1.x.The basic principle of QJM is that twometadata servers works in a cluster, one starts in active mode, the other starts instandby mode.The transaction logs will be written on both active namenode and logservers.The standby namenode reads logs from log servers so that both of twonamenodes have the same metadata. Once the active namenode fail, the standbynamenode can quickly take over the work. In the end of chapter-3a raid-solution isgiven to offer an extra protection of log storage.

Keywords/Search Tags:

Destributed FileSystem, Hot Standby System, Single point of failure, Huge amounts of small files

PDF Full Text Request

Related items

1	The Research And Design Of A Storage And Distribution Architecture For The Innovation Knowledge Based On HDFS
2	Huge Amounts Of Data Management Technology For Internet Of Things
3	Design And Implementation Of Metadata Server For Mass Stream Data Storage System
4	Design And Implementation Of "Single-Point Failure-Recovery Project" For System Of The Centralized Securities Exchange
5	Design And Implementation For Sfs-High Performance File System For Storing Small Files
6	Dynamic Huge/small-page Memory Adjustment Based On PHPA
7	The Design And Implementation Of Small File Storage System Based On FastDFS Architecture
8	Research And Optimization Of Reliability Of Hadoop Distributed File System
9	Study On Single Point Of Failure And Service Backup Mechanism In MAS-Based CSCW Architecture
10	Huge Amounts Of Digital Image Processing Platform Based On Hadoop