Font Size: a A A

A Hybrid Coding Based Content Placement Scheme For Hadoop System

Posted on:2018-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:K WenFull Text:PDF
GTID:2428330569985407Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the coming of the big data era,the information in various fields presents an explosive growth,how to manage the massive data began to be widespread concern.Hadoop is a software framework that can store and process large amounts of data,so it is accepted and used by many enterprises and individuals at the beginning of the launch.Hadoop implemented a distributed file system called HDFS,using three copies of the placement strategy to ensure the reliability and fault tolerance of the data,while providing MapReduce to process the data effectively.Although the three copies of the HDFS placement strategy can effectively ensure the reliability of the data,but it takes up a lot of storage space,resulting in low storage efficiency.Erasure code technology is currently being used to optimize the HDFS data placement strategy.By encoding the data blocks in the file system,it can ensure reliability and effectively reduce the redundant storage space,but in the data recovery stage,it will cost too much network traffic overhead.The content Hadoop data placement method based on hygrid coding introduces the concepts of RAID 1 and RAID 5 into the Hadoop system,allowing the data blocks to be distributed in two copies of RAID 1 to improve storage efficiency,while using XOR code in RAID 5 to encode the data blocks into parity blocks to ensure the reliability of the system.With this method,we can achieve the purpose of balancing the storage space redundancy and network traffic overhead in the data recovery.After analyzing the feasibility of Hadoop data placement method based on RAID 1+5,the method is implemented on the basis of HDFS-RAID architecture.Experiments show that the Hadoop data placement method based on RAID 1+5 can effectively reduce the storage space and improve the fault tolerance rate,and the system has good performance of data writing,data reading and data recovery.
Keywords/Search Tags:Distributed file system, Erasure code, Storage efficiency, Reliability
PDF Full Text Request
Related items