Font Size: a A A

Research On Key Technologies Of Data Incremental Synchronization For The Cloud Storage Services

Posted on:2022-04-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:C J ZhangFull Text:PDF
GTID:1488306569458514Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the cloud storage scenario,data incremental synchronization gradually occupies the main position in data synchronization relying on the advantage of less resource consumption.Although incremental synchronization can greatly reduce the network bandwidth during synchronization,it will generate computing and disk IO pressure during synchronization.Meanwhile,it faces challenges such as high concurrency,high load,serialization,and high latency caused by massive synchronization requests.Although there are much researches on data incremental synchronization and backup server performance optimization,the challenges are still not well solved.Thus,this thesis proposes some optimization algorithms.The main work of this thesis includes:1.Facing the problem of low efficiency of incremental acquisition in data incremental synchronization,two content-defined chunking algorithms are proposed for data incremental synchronization.This thesis discusses the relationship between the chunking algorithm and incremental data acquisition in theory and proposes two more suitable data chunking algorithms: Minimum Incremental Interval algorithm and Parity Check Interval algorithm.To get a better effect of incremental data acquisition,the two algorithms obtain better anti-byte-shifting ability by sacrificing the stability of partial chunk length,and the anti-byte-shifting ability directly affects the accuracy of incremental data acquisition.The experimental results show that the Minimum Incremental Interval algorithm and Parity Check Interval algorithm can reduce the non-differential data in the incremental data acquisition results to 20% ? 57% of other algorithms on the premise of keeping the same with the comparison algorithm in other aspects,which improves the efficiency of incremental data acquisition.2.Aiming at reducing to redundant steps in the communication process around incremental data synchronization,an optimization algorithm of data incremental synchronization based on shadow data is proposed.In the cloud storage scenario,data changes only occur in the client,and the server is only used to store data without modifying the data.Therefore,multiple communication processes of data incremental synchronization algorithms are not all needed.Therefore,the optimization algorithm reduces the computing load and I/O load of the server by sacrificing a small amount of storage space of the client and reduces the communication flow of the data incremental synchronization algorithm.Specifically,this thesis designs a new data structure: shadow data,which is stored in the client to replace the chunk checksum of the backup data.Therefore,the server can save the checksum calculation of the backup data,the process of fetching the summary,and the communication process of transferring this checksum,which effectively reduces the server's computing load and IO times.In this thesis,with the help of real data sets,the algorithm is compared with the previous algorithm.The experimental results show that the algorithm can reduce the CPU load of the server by about 80%,which verifies the practicability of the data incremental synchronization optimization algorithm based on shadow data.3.Focusing on the problems of high concurrency,high computation and high latency of massive synchronization requests faced by the server,this thesis proposes a synchronization request processing algorithm based on distributed computing framework.In the cloud storage scenario,the backup server is faced with many challenges brought by massive requests,such as high concurrency,high computation and disk IO,and high service latency.To face these challenges,this thesis proposes a synchronization request processing algorithm based on distributed computing framework.Specifically,the algorithm first stores synchronization requests in message middleware,to decouple the receiving and processing of synchronization requests and provides parallel conditions for the synchronization.Then,the synchronization request processing is designed as a computing task submitted to the distributed computing framework.The experimental results show that the proposed algorithm supports higher concurrency than Rsync and other services,reduces the synchronous processing time by 18%? 82%,and reduces the amount of transmitted data by 5.3% ? 90.9%.These two ranges correspond to the worst and the best of all test results respectively,and these improvements need to be achieved at the expense of part of the storage space of the client.
Keywords/Search Tags:Distributed Computing, Incremental Synchronization, Data Partitioning Algorithm, Communication Process, Message Middleware
PDF Full Text Request
Related items