Font Size: a A A

Research On Key Technologies Of Application-Aware Data Deduplication

Posted on:2015-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:F LiFull Text:PDF
GTID:2348330509460754Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the great development of data, we are now facing the time of big data. How to manage and analyze these data efficiently has becoming a serious problem to us. In this case, data deduplication technology has been widely applied in various fields, including data storage systems, data backup systems, data disaster recovery, medical treatment, etc., and even tend to be used in the communication field.In this paper, we analyzed and studied the key technologies of deduplication system, and design a new system algorithms to improve the performance of deduplication systems. The main innovation of this paper include the following content:(1) The duplicate data prediction mechanism based on application-aware. It can forecast the duplicate rate of data before doing deduplication, help users to have knowledge of the effectiveness of the system, for the most use of the storage system. Compare with the existing deduplication prediction techniques, application-aware deduplication technology can further reduce the size of the predict index, and enhance the performance of prediction algorithm further.(2) The adaptive updates algorithm of the index. Conventional prediction algorithms duplicate data similarity rate based on the data index, only can estimate the data similarity rate inside the database or between the database and the storage system. The adaptive update algorithm of index constantly updated index table based on the information of the access data chunk. It can think of the similarity rate both inside the database and between the database and the storage system, increase the accuracy of the estimating of duplicate data.(3)To optimize the indexing disk bottleneck problem of the storage system, we propose a Cache replacement algorithm based on hot data. We found that in the database, the data repeat more times take so much size of the database, which we call repetitive characteristic of data. Based on this characteristic, the Cache replacement algorithm based on hot data can achieve higher Cache hit rate, solve the disk bottlenecks problem.The several research achievement of the application-aware deduplication key technologies said above, could provide an effective means of optimization for data storage and management in the cloud storage environment.
Keywords/Search Tags:Deduplication, Estimating of Duplicate Data, Application-aware, Index Tuning of Cache
PDF Full Text Request
Related items