Font Size: a A A

The Analysis And Reaserch Of The Storage Mechanism Of HDFS

Posted on:2015-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:J H LvFull Text:PDF
GTID:2298330452450747Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The size of data has being growing sharply in recent years.With the result thatsuch amounts of data can not be stored in normal file system,the concept ofdistributed file system is brought up.Nowadays,many well-known enterprises useHadoop to process huge amounts of data. Hadoop is a cloud computing platform forprocessing and keeping vast amount of data.It includes its own distributed filesystem named HDFS in which there two kinds of servers,one is called namenodeused to store the namespace of the entire cluster;the other is called datanode used fordata storage. Each file in HDFS has three copies in case of a data corruptionproblem,and the file replication can be configurated by the client.This thesis will mainly analyze the storage mechanism of HDFS, and point outa fact that HDFS is not tailored to process huge amounts of small files and can causethe SPOF problem.Chapter-2first gives an introduction of the archive and sequencefile,both ofwhich are implemented in hadoop,then quest for a new solution for the procession ofsmall files and compare it with the solutions before. Multi-NameNode Cluster caneliminate the memory performance bottleneck of a single NameNode.EachNameNode in the cluster is independent of others so that one fails will not affectothers.Thesis makes an comparative experiment on Hadoop Archive andMulti-NameNode Cluster,and analyzes the experimental results.As for the SPOF problem, this thesis emphatically analyzes the QuorumJournalNode Management scheme(QJM for short) in hadoop2.x because of theremain of the SPOF problem in hadoop1.x.The basic principle of QJM is that twometadata servers works in a cluster, one starts in active mode, the other starts instandby mode.The transaction logs will be written on both active namenode and logservers.The standby namenode reads logs from log servers so that both of twonamenodes have the same metadata. Once the active namenode fail, the standbynamenode can quickly take over the work. In the end of chapter-3a raid-solution isgiven to offer an extra protection of log storage.
Keywords/Search Tags:Destributed FileSystem, Hot Standby System, Single point of failure, Huge amounts of small files
PDF Full Text Request
Related items