Font Size: a A A

The Design And Implementation Of Data Deduplication With Garbage Data Removal Policy

Posted on:2015-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y FengFull Text:PDF
GTID:2308330473451819Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with the next generation of wireless mobile networks process in China, the underlying control network storage systems that support a wide range of mobile telecommunication services has been put forward higher requirements. Today, the increasing number of users and data traffic has made traditional storage solutions become stretched.with the advent of this trend, how to effectively use the storage space becomes a serious problem.In this paper, firstly introduce and analysis a number of domestic and international mainstream distributed storage system, and then analysis some problems and solutions of distributed systems need to be faced, we designed and implemented a distributed file system named CStore for the vast mass of users and files, which is based on the data block level deduplication, this paper designs and implements data collection system based on CStore system.CStore is a typical distributed storage system, which uses the architecture of metadata stream and data stream separetion, metadata and file data in the system are stored in different clusters, each client access to the data on their own management and optimization. Positioning system resources based on a two-stage hash mapping scheme, reliability in the bucket load balancing units and a copy of the strategy to improve the system, the system has good scalability.Data deduplication is one of the main features of CStore system, which uses the online-based block-level deduplication data deletion policy, which requires users to dibide the entire file into a plurality of data blocks when uploading files, Such a strategy can save a large amount of data storage space and improve the user experience. However, it also brings data deletion problem. Based on the Data deduplication of CStore system architecture, designed and implemented the systemin order to be able to perform online or offline data delete junk invalid data deletion,the system can efficiently and accurately determine the invalid data, further savings in storage resources.Garbage data recovery system for cleaning the data blocks without any invalid file references data block in CStore system. The system is based on Bloom Filter algorithm, respectively, in the corresponding operation on the metadata server cluster data server clusters, global monitored by a central control node responsible. In terms of fault tolerance by establishing connections with the central node heartbeat, node failure can effectively get detected, and then re-distribute tasks, and the system provides a visual interface for administrators to control and manage.In the last, the thesis tests the functionality and performance of the system, which proves its reliabilityand high efficiency.
Keywords/Search Tags:distributed storage systems, data deduplication, invalid data recovery, Bloom Filter
PDF Full Text Request
Related items