Font Size: a A A

The Cloud File System Based On Erasure Code And HDFS

Posted on:2013-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:B C ChenFull Text:PDF
GTID:2248330371983566Subject:Software engineering
Abstract/Summary:PDF Full Text Request
It is a hot topic in the area of cloud computing how to guarantee data reliability in cloudfile systems. The copy backup technology is an important approach to guaranteeing datareliability in the distributed file system. However, there are some disadvantages such as takingup too much disk space and having low saving efficiency. Erasure Code is another technologywith which to guarantee data reliability in the distributed file system. Erasure Code canprovide optimized data redundancy thereby preventing data loss. Applying Erasure Codeproperly can help improve space utilization ratios and acquire satisfactory data protectioneffects.Erasure Code is well-known for its features of having high coding efficiency and savingstorage space. This paper introduces Erasure Code to cloud file systems so that it can entirelyor partially replace the copy backup strategy and improve the performance of cloud filesystems.The system designed in this paper uses HDFS as the platform and integrates with theErasure Code technology by basing itself on its copy backup strategy. Thus the strengths oftwo strategies can be both made use of thereby guaranteeing data reliability. In the meantime,the reliability of various degrees of intensity can be realized by using parameter adjustmentsthus to meet the requirements of multifarious situations.The overall design philosophy is shown as below: apply Erasure Code to the file blockencoding process before uploading the files, then upload the encoded file blocks to the HDFSfile cluster after merging them, and in the meantime, reduce copy parameter replication in theHDFS cluster so as to economize on storage space; during the stored procedure, thefault-tolerant mechanism of HDFS can guarantee data reliability; during the file downloadingprocess, download encoded data blocks from the HDFS cluster and then acquire the sourcefiles by decoding them; when some data blocks of files are corrupted, just download otherdata blocks from the HDFS cluster thus to recover the source files.In order to bring the above-mentioned design philosophy into reality, this paper designsand realizes the cloud file system which is based on Erasure Code and HDFS. This system isdivided into two parts: the client and the HDFS cluster. The client is the key point of theentire system design and realizes some crucial functions such as file partitioning, the encoding and decoding of file data blocks, the merging of encoded file data blocks, file uploading anddownloading, error control, etc. The HDFS cluster is composed of computers with Hadoopsoftware installed in and takes charge of file storage.The features of the system designed in this paper are listed as below:Reliability: Erasure Code can provide optimized data redundancy thereby preventingdata loss. In the meantime, it can adjust the backup parameters of the HDFS file clusteraccording to different applications so that the purpose of dual guarantee for data reliabilitycan be achieved by applying Erasure Code and the copy backup.Flexibility: the primary functional modules of the cloud file system designed in thispaper are mostly realized in the client, the file cluster is built in line with the commonly usedHDFS cluster structuring method. In this way, not only is the complexity of HDFS clusterstructuring brought down, but also its applicability is boosted.Extensibility: Hadoop, the cloud infrastructure software, possesses powerful storageextensibility. It can add in storage nodes anytime as required and extend the scale of theHDFS cluster.Economical efficiency: HDFS is designed especially for cheap hardware and has finecompatibility. Computers of any type and any level can join in file clusters by applying thiskind of software. In addition, users’ idle and cheap computer resources available can be madethe best of thus cutting down on investments in equipment purchases.System testing has indicated that the encoding and decoding speed of the system hasalready met the performance requirements of cloud file systems; this system adopts thestrategy of merging and uploading encoded file blocks so that the system performance isfurther improved.It is our further research target to bring down the computing complexity of Erasure Codethus enhancing the system performance.
Keywords/Search Tags:Cloud Computing, Cloud File System, Erasure Code
PDF Full Text Request
Related items