Font Size: a A A

An Optimization Algorithm Of Cloud Storage Based On Data Dependency Relationship

Posted on:2017-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:K H R LiuFull Text:PDF
GTID:2308330485988191Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cloud storage, as a new kind of storage model, includes lots of features, such as low price, high reliability, elasticity and pay-as-you-go. More and more companies and users use cloud storage to storage mass data. Therefore, how to reduce the huge storage cost generated in cloud storage is a very challenging problem.In cloud storage, there is dependency relationship between data. For instance, Video website usually transcodes source files to many kinds of video files with different resolutions for catering to different users. In this case, there exists dependency relationship between the source files and the transcoded files.At the present, the main solution that using data dependency relationship to reduce data storage cost is as follows: determining whether the data needs to store or not depends on the specific algorithm. When there is a request that needs to access data without storing, the storage system first rebuilds the data and then provides access service. The total cost of the storage system includes two parts: storage cost generated by storing data and compute cost generated by rebuilding data. Generally, the data without storing will not be accessed frequently, and thus its compute cost is less than storage cost. So the total cost of the storage system is reduced.However, the existing solutions usually use the constant multi-replica storage policy without considering the case of variable storage policy. What’s more, there exists a deficiency: in the case that the time of regenerating data exceeds the response time that users can tolerate, the data is not available.According to problems mentioned above, this thesis proposes an optimization algorithm, named data dependency relationship based optimization algorithm in cloud storage. The proposed algorithm can further reduce total cost of the storage system with guaranteeing the promised data availability. The main contributions of the thesis are as follows:(1) Propose a data dependency based storage model with reduced redundancy(D2SMR2). Compared with existing researches, the proposed model adopts the variable storage policy which can reduce redundancy of data replicas, develops a new computational model for total cost of one data, and further considers a constraint of the data availability.(2) Propose a computational formula of data’s availability and a computational formula of data generating time in D2SMR2. Compared with existing researches, the proposed D2SMR2 further considers the influences of two factors, i.e., the response time that users can tolerate and nodes where the data is stored. What’s more, data generating time in D2SMR2 is a random variable, which depends on data faults of its dependency nodes.(3) Propose a data storage policy decision algorithm to reduce the total cost in D2SMR2. The proposed algorithm directly determines the storage policy for new data. At the end of each time period, the proposed algorithm updates the data’s storage policy according to the access records in this period.(4) Develop a simulation system for cloud storage system with considering data dependency and conduct simulations to evaluate the efficiency of the proposed algorithm. Compared with traditional simulation system, the developed system can record the data’s dependency relationship and simulate the node’s failure and fault recovery, queue of user requests, data generating and so on. Based on the developed simulation system, this thesis conducts contrasting simulations to evaluate the proposed model and algorithm under real trace of dataset and data’s dependency relationship which is randomly generated.
Keywords/Search Tags:Data provenance, Data regeneration, Cloud storage system, Data cost model, Service level agreement
PDF Full Text Request
Related items