Font Size: a A A

The Design And Implementation Of A De-duplication File System Based On Cloud Storage

Posted on:2014-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:J J ShiFull Text:PDF
GTID:2268330422464748Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As the demand for online storage services increases, the cloud storage companiesbegin to explore the billing model. The better service you get, the more money you shouldspend. Free cloud storage space has been unable to meet the needs of users. The cost ofcloud storage is beginning to affect the user’s life. Aiming at this problem, a de-duplicationfiles system based on cloud storage is proposed.The system is a cloud storage incremental synchronization client file system, thede-duplication technology has been used in the system, the local data that without redundantwill be uploaded to the cloud storage automatically. The system consists of six modules, theuser interface module receive the system request from the Fuse kernel, it will call somemodule to complete the response. Cloud synchronization module use the cloud storage openinterface to corporate with other modules for local synchronization. File managementmodule obtain the file list from the cloud synchronization module, create the index nodes, toorganize the files. File operation module handle the read and write requests. De-duplicationmodule remove redundant data at the source side, the module uses the content chunkingmethod, a fixed-sized sliding window is used to calculate the fingerprint for each overlapsegment of the file, if fingerprint mod a special integer equals a predetermined amount, theportions of the object between these breakpoint values are classified as chunks, if thefingerprint is equal to the last fingerprint, it will be classified as duplicate data. After thesestep, the system upload the file and metadata to the cloud storage. When the file system isgoing to be destroyed, garbage collection module will delete the useless metadata and theredundant file.Real-world data sets like Linux kernel and virtual machine disk images have beenused to evaluate the system’s de-duplication ratio. According to the results, in large-scaledocument data, the highest de-duplication ration can reach67%. We used the Ali cloudplatform charging standard to count, when the system is used, one terabyte user data cansave4391yuan per year in theory.
Keywords/Search Tags:De-duplication, Cloud storage, File system
PDF Full Text Request
Related items