Font Size: a A A

Fault Tolerant Storage Technology Research Based On Network Coding In HDFS/Hadoop

Posted on:2014-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y XueFull Text:PDF
GTID:2308330482452246Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of Internet technology and the rapid expansion of massive data, the storage and management of massive data has become a hot industry, also an important supporting technology for the "big data processing". In the distributed storage system, it provides good scalability with the support for large capacity storage, and it is widely used as its capacity, performance, and management are able to quickly adapt to changes of the system. However, with the increasing high demand in high-capacity data and private data storage efficiency, the existing distributed storage systems used by multiple copies of the data based on fault-tolerant storage mechanism would cause the system to take up too much extra storage overhead and data transfer bandwidth consumption, and increase the storage space of failure repair. To improve the reliability of data storage and reduce data redundancy rate, the industry based on information theory fault tolerant data encoding mechanism has been carry out research, distributed data based on network coding fault tolerant storage technologies also been taken an active interest.According to the fault tolerant of distributed data storage problem, this article is based on open source cloud storage platform HDFS, study the problem of redundant costs and additional transmission bandwidth based on multiple copies of the data fault-tolerant technology. On this basis, through the introduction of information theory mainstream fault tolerant data coding techniques to expand on HDFS system architecture, a distributed fault-tolerant network coding based storage system NC-HDFS was designed and implemented, in effectively reducing storage redundancy to improve on the basis of fault tolerance of the system to meet the needs of distributed data storage reliability. This study includes:1)Based on the HDFS open source cloud storage system, this paper designs a distributed data fault-tolerant storage platform with network coding, provides a common integration framework for the integration of a variety of heterogeneous fault-tolerant coding algorithm, and supports the adaptive optimize coding strategy according to the storage node size and the size of file of the system.2) For the data fault-tolerant technology based on multiple copies brings extra storage redundancy overhead and transmission bandwidth, this article designed and implemented low redundant fault-tolerant storage mechanism based on erasure codes, and measured on the system performance from the file storage space, the amount of data downloaded of repairing a lost node, and the time cost of file reading, writing, and repairing.3) For data erasure codes based fault-tolerant technology in data repair when taking up extra bandwidth issues, this article designed and implemented a fault-tolerant storage mechanism which is based on the network coding, effectively reduced bandwidth consumption while repairing, and measured on the system performance from the file storage space, the amount of data downloaded of repairing a lost node, and the time cost of file reading, writing, and repairing.
Keywords/Search Tags:distributed storage systems, network coding, erasure codes, data fault tolerant storage technologies
PDF Full Text Request
Related items