Font Size: a A A

Design And Implementation Of HD_EC File System Based On Error - Correcting Code

Posted on:2016-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:P LiuFull Text:PDF
GTID:2208330464963530Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information globalization, the internet has been developed rapidly, it has been widely used in the scientific research, the data mining and the information retrieval. So, more and more data need to be stored and management, and then the data explosion are accruing. The explosion of data makes the local file system in terms of storage and computing power more and more cannot meet the demand of users. Distributed file system can effectively solve the problem of data storage and management, and also gradually replace the local file system as basic storage of enterprise data. The digital information promotes the development of the distributed file system; it also puts forward higher requirements to the security of the data in storage system. In the times of data is information and data is value, it becomes increasingly important of how to protect the “data assets” of enterprise not to be damaged of mission. The popular technology of data security are backup, logs, snapshot and erasure coding, the erasure coding has been widely used because of its good expansibility and can save more storage space.In this thesis, I mainly research the HDFS which is current relatively popular. I analysis the disadvantages of the triplicate copy of HDFS, and then replace it with erasure coding. It not only safeguards the safety of data, but also save the storage space, under the circumstances of the total data is N, compared with the triplicate copy, it has save 1.3N space.The method is implemented based on HDFS and ECFS. ECFS is implemented with erasure coding technology, so, I realized the integration of HDFS and ECFS, put the system data from HDFS to ECFS. Once the write command is gave out from HDFS Client, it will modify the namespace and apply the datanodes to be write to the HDFS namenode, and then storage the data to the ECFS OSD. Before storage, it will firstly compute the checksum block with erasure coding algorithm, and then, put the data blocks and checksum blocks to the ECFS OSD. Because the HDFS is implemented with Java, and the ECFS is implemented with C. So, I used the JNI which is the Java application to transformation between Java and C.The thesis also completed the job of system test, it executed the test from the realization of system functions, storage space, the read and write efficiency. From the experiments results, in the aspect of system functions, it implements the function of file operations, and can rightly write the data to the ECFS; In terms of storage space, It not only safeguards the safety of data, but also save the storage space; In terms of writing and reading efficiency, put the data to ECFS, the speed will have a certain degree of slow, which is an important part of the next research work.
Keywords/Search Tags:HDFS, ECFS, Erasure coding, data transmission, JNI, HD_EC file system
PDF Full Text Request
Related items