| With the application of information technology to different degrees in all walks of life,information data has also shown an exponential growth trend,which has a certain impact on data storage and backup.As a redundant data elimination technology,cloud storage data deduplication has been widely recognized and concerned by scholars in the industry.The use of cloud storage data deduplication technology can not only reduce the storage burden of local space and information management costs,but also improve the utilization of network bandwidth.In the cloud storage data deduplication technology,the duplicate data detection rate and recall rate of the data chunking technology directly affect the overall deduplication effect.In this paper,two optimized block detection algorithms are proposed in order to provide reference for the development of cloud storage data deduplication technology.Aiming at the limitation that the sliding tiling algorithm cannot handle the matching failed data blocks in the change file,this paper proposes an optimized sliding tiling algorithm based on the client and the server.On the client side,the optimized sliding blocking algorithm retains the core steps of the sliding blocking algorithm to conduct the initial detection of the detection data,simplifies the verification process of the two rounds of strong and weak hashes of the sliding blocking algorithm into one strong hash operation,adds a preprocessing mechanism to process the matching failure data blocks,and delays the cut-off position of the sliding window relative to the sliding blocking algorithm,so as to facilitate the sliding window to further detect duplicate data;On the server side,the optimized sliding tiling algorithm proposes a deduplication compensation mechanism,which performs fixed-length chunking through the subwindow for the client preprocessing results to achieve secondary duplicate data detection.From the test results,compared with the current mainstream detection algorithms,the local modification file recall rate and the overall modification file recall rate of the optimized sliding block algorithm have been significantly improved.Aiming at the problem that the fixed chunking algorithm mistakenly misses redundant data due to high sensitivity,this paper proposes an optimized fixed chunking algorithm based on the client and the server.On the client side,the optimized fixed chunking algorithm retains the basic steps of the fixed chunking algorithm,and the file can be chunked according to a fixed length and the corresponding hash value can be calculated,and then the hash value can be filtered through the hash table for the first time to obtain the data block that failed to match.By introducing the computing resources and storage resources of the server,the duplicate data tracking mechanism is proposed in the optimized fixed chunking algorithm.The duplicate data tracking mechanism will identify the bounds of the fixed-length data block that failed the client match byte-by-byte through the lookback window until the backtrace window does not overlap the fixed-length data block at all.According to the test results,compared with the current mainstream detection algorithms,the optimization of the fixed chunking algorithm further improves the high duplicate data detection rate of the fixed chunking algorithm without adding additional overhead on the local side,and at the same time improves its recall rate,thereby reducing the sensitivity of file modification operations. |