Font Size: a A A

The Research On Key Technology In Deduplication On Cloud Storage

Posted on:2017-08-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q L HeFull Text:PDF
GTID:1318330566455707Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Following with networking,mobile smart devices and mobile communications,cloud computing technology developmenting rapidly and applying extensively,the amount of data generated by individuals and businesses increasing rapidly,with data size of the data center has reached even EB and PB grade level.Faced with the management needs of such scale data storage,cloud storage is imperative.As a new storage system,cloud storage uses virtualization and other data management technology to provide a low cost,highly scalable storage service.Studies have shown that various types of application data in cloud storage systems exist in about 60% of the average repeat,and with the passage of time,the amount of duplicate data will continue to grow.Deduplication is a data storage system lossless compression solution that can suppress duplicate data effectively and rapidly,saving storage and bandwidth resources reduce build time and reduce the cost of storage systems operations management,and has been widespread concern in both academia and industry sector.Based on block deduplication is the mainstream method of data deduplication,but the performance is not satisfactory when it applied to the actual system.There are two major aspects affecting the performance: First,block indexes to retrieve cause larger write delay.To find duplicate data blocks,the system needs to maintain an index of all data blocks which contain valid data.As mass storage system data block index is very large,finding data block index frequently can cause serious write latency.Secondly,the data storage gives rise to data fragmentation problem.The storage mode that multi-file share data block leads to that the content is stored in a single file multiple non-contiguous sectors.With the increasing amount of data,the number of sharing data block will increase.When the file is read,we need to access to different locations based on citation data fragmentation,which affect the reading speed.On the basis of the storage model that consists by group-peer-cloud storage system,the paper focus on the block to find the index disk bottlenecks and data fragmentation problem,in order to achieve the goal that improves the performance of data deduplication.Aiming at the shortcomings of the existing methods,the paper study several aspects of the system work,such as improving detection methods from the same block of data,improving the throughput of data deduplication system,and improving the performance after deduplication read.Research and innovation of this paper is mainly reflected in the following aspects:(1)The paper put forward a parallel data deduplication algorithm based on cluster environment,which uses the cluster computing,parallel processing data block and block the fingerprint calculation,and uses of idle computing cluster environment brought by eliminating system data deduplication write performance bottlenecks effectively to improve the performance of data deduplication system.The experimental results show that the parallel data blocks of recombinant technology can make the system delete redundant rate to 98.3%,which shows that the performance promotes significantly.(2)In view of the characteristics that virtual desktop storage has a large amount of data redundancy,using data deduplication reduced storage space requirements of the virtual desktop infrastructure;In this paper,the architecture of the two-stage structure design.In order to build a deduplication system,make full use of exsited resources,by introducing appropriate deduplication technology-based distributed systems,and are working in online mode,I / O performance,you can basically meet the design requirements.(3)For the problem that reading performance degradation caused by data fragmentation,the paper proposes a new hybrid storage-based deduplication read performance improvement strategies.Using the feature that SSD disk has high random read performance and low power consumption in mixed storage environments effectively,the method let random read requests replace HDD disk with SSD disk,which improves read performance of the system significantly.The results by trace replay prototype system and virtual machine disk image reading performance evaluation show that the mix deduplication storage system which applies the strategy are better than traditional deduplication system in the reading performance and energy consumption.(4)The paper proposed a cloud storage system model which is fully distributed and based on the data deduplication.The model comes into being “group-peer to peer-center storage “hierarchy,with user-centered form.Chord algorithm using resize multiple service management node,and then more data distributed to user requests blocks server to build a fully distributed system with no central node,which has better load balancing and delete redundant high efficiency,in order to improve the performance and service quality of cloud storage system.The experimental results with higher performance and availability of a model,it shows that it provides a relatively high cloud storage quality of service provided to customers.
Keywords/Search Tags:Cloud storage, Data deduplication, Massive Data Backup, Throughput, Read performance
PDF Full Text Request
Related items