Font Size: a A A

Research On Data Redundancy Technologies Of Distributed File System Based On HDFS

Posted on:2012-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:H WuFull Text:PDF
GTID:2178330338450166Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Storage system is playing a vital role with the development of information technology. Today because of the explosive growth in data, the local storage is difficult to meet the need for mass storage. Besides that, personal mobile computing and enterprise computing have a higher requirement to the underlying storage system. So, people are increasingly using a distributed storage system for its higher storage capacity, reliability, security and mobility.Data redundancy technologies are studied in this paper. Reaching certain reliability requires a high storage because most traditional distributed systems execute data redundancy using redundant array of independent disk (RAID) and replication technology. Later encoding technology is proposed to encode and storage data. But this method improves the reliability of the data also brought to large losses of performance when reading and writing data.A scheme of using a combination of replication and network coding techniques for data storage is put forward. Compared with the previous schemes, it better balance the data reliability and reading performance. Combining the structure and mechanism of HDFS( Hadoop Distributed File System), the complete processes for file block encoding and decoding, encoding block placement policy, the file reading and writing are also introduced. After that, how to conduct the load balancing and how to deal with the machines exit and join frequently on a cluster consisting of large scale of cheap and low reliable machines is given. With this data redundancy, we can improve the reliability of data in the same degree redundancy and reduce the negative impact on reading performance when coding as much as possible simultaneously.Firstly, the paper introduces the research status of distributed file systems. Next, the paper proposes the program of combination of replication and network coding. At the same time, brings forward specific design in the HDFS architecture. Finally, there are the theoretical analysis and practical simulations to the provision of the reliability and also the desired results that the design can achieve.
Keywords/Search Tags:Distributed File System, HDFS, Data Reliability, Data Redundancy Technologies
PDF Full Text Request
Related items