Font Size: a A A

Research And Implementation Of File Incremental Synchronization Based On Data Block

Posted on:2021-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:K YuFull Text:PDF
GTID:2518306464983649Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of a new era of big data,it is inevitable that organizations will continue to accrue large amounts of data,not only from traditional sources that are product level focused but also from digital outlets such as mobile devices,social media networks or the Internet of Things.Therefore,information safety has become more and more important for enterprises and industries and ensuring adequate disaster recovery backup in the event of an accident has become an increasingly important research direction.Because of the volume and velocity features of Big Data,it is necessary to synchronize the source data to the backup server in a fast and effective manner.The traditional synchronization methods suffer from problems such as occupying large amounts of storage space and high network bandwidths,and low synchronization efficiency when dealing with Big Data.This thesis proposes to complete incremental identification based on data block algorithm and Bloom filter and designs an incremental synchronous backup tool based on this.This thesis first introduces the current research status of data synchronization backup to clarify the needs and goals,and analyses related technologies including data block algorithm,Bloom filter,Inotify mechanism,etc.The block algorithm part compares the fixed length block and the variable length Partitioning,focusing on the Rsync algorithm and RAM algorithm,and analysis of their characteristics and shortcomings,and introduces the standard Bloom filter and some improved Bloom filters based on it.Secondly,this thesis proposes an improved RAMM algorithm and a non-partitioned single hash Bloom filter to overcome the problem of the long block of RAM algorithm and the shortcomings of requiring multiple high-demand hash functions of the standard Bloom filter,and experimentally and analytically verifies the rationality and effectiveness of the improved algorithm.Thirdly,an incremental synchronization backup tool is designed and implemented in a hierarchical and modular manner,which mainly contains four modules,network transmission module,data monitoring module,data synchronization module and control module.The monitoring module mainly uses the Inotify mechanism To realize the monitoring of files to achieve the purpose of real-time synchronization,the synchronization module mainly uses the RAMM block algorithm and the non-partition single hash bloom filter to achieve incremental recognition and synchronization.At the end of the thesis,we conducted a series of tests on the incremental synchronization backup tool.The test results show that compared with the full synchronization,the improved RAMM block algorithm and the non-partition single hash bloom filter can efficiently complete the synchronization backup and reduce network bandwidth and memory consumption.It also performs well while applied to the Ceph distributed file storage system built on the Open Stack cloud computing platform.
Keywords/Search Tags:Data Chunking, NPSHBF, RAMM, Incremental Synchronization, Data Backup
PDF Full Text Request
Related items