Font Size: a A A

Research On Optimization Of The Memory Cache Policy Based On Hadoop In Hybrid Cloud Environment

Posted on:2018-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:S M DuFull Text:PDF
GTID:2428330596954777Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing,hybrid cloud which is the significant one of the cloud computing models has been widely used because of its security,scalability and computing flexibility.Nowadays,more enterprises use Hadoop to deal with massive data in hybrid cloud.However,due to short development time of Hadoop in the hybrid cloud environment,there are some shortcomings in its cache system: There is no considering about the heat's distinction of the cache files.Some files with low heat are added into cache,while other files with high heat cannot meet requirements,which causes the low cache hit rate.Files existing in the cache can be added into the cache again,which results in the redundancy of cache files and the waste of cache space.Besides,there are so few reasonable cache replacement strategies that files cannot add the cache when cache capacity is insufficient.Therefore,in order to achieve high cache hit rate,how to add files into cache and how to replace the cache files efficiently is the main problem which need to be solved in the cache system of hybrid cloud.In addition,a good cache prefetching strategy could reduce the read time of the files and improve the efficient execution of the tasks effectively.This paper studies the cache replacement strategies and cache prefetching strategies based on Hadoop in hybrid cloud environment,which can afford certain theory and technical support for caching system in the hybrid cloud environment.Meanwhile,it has great significance.The main contributions of this paper are described as follows:(1)An optimized cache replacement algorithm based on priority and Least Recently Used is proposed in the hybrid cloud environment.In the cache system of hybrid cloud,the heat of the files is not considered and there is a lack of appropriate cache replacement strategy,which lead to low cache hit rate.In order to make rational use of cache space and improve the cache hit rate,this paper considers the heat of files,access features of users,priority weights and puts forward an optimized cache replacement algorithm based on priority and LRU after researching the existing LRU algorithms.The algorithm first calculates the heat of the file which is cached.The file with low heat will not be added into the cache.If the heat is too high,the demand of cache copy is calculated according to the characteristics of its access and then the cached copy which has the highest access frequency is added into the cache system of the node.Secondly,in the process of adding files,the LRU algorithm is used to select the file in each cache priority queue respectively if the cache capacity is insufficient.And then the weights to be accessed again of these files are calculated.Finally,the file which has the minimum weight is removed and the file which will be cached is added into its corresponding queue according to the priority weight of the file.(2)A cache prefetching algorithm based on Bayesian network is designed in the hybrid cloud environment.There is a lack of appropriate cache prefetching strategy in the cache system of hybrid cloud.This caused the low bandwidth utilization and the long read time.In order to solve these problems,this paper proposes a cache prefetching algorithm based on Bayesian network after analyzing the shortcomings of the existing cache prefetching algorithms,such as the poor prefetching dynamics,the low hit rate and the low bandwidth utilization.The algorithm first predicts the task which will be performed based on Bayesian network and then finds out all data files of the task.The prefetching files are selected according to the benefit and the cost of the data files.Secondly,the load of each node is calculated and the lower load node is selected according to the current idle bandwidth,the request response time and the cost of public cloud.Finally,the prefetching files are added into the cache of the selected node according to the current idle network bandwidth and the size of the files.(3)The above two optimization algorithms are verified and analyzed in the experiment.The results are analyzed according to the experiments of above two optimization algorithms.This paper verifies the feasibility of the cache replacement algorithm proposed in this paper and compares the performances of this algorithm with the same performances of LFU,LRU and AD-LRU algorithm.The results show that this algorithm has certain advantages in cache hit rate,delay saving rate and cost saving rate.This paper also verifies the feasibility of the cache prefetching algorithm proposed in this paper and compares the performances of this algorithm with the same performances of the prefetching algorithm based on access frequency and the IPC algorithm.The results show that this algorithm has certain advantages in hit rate of prefetching and time saving rate compared with the other two algorithms.
Keywords/Search Tags:Hybrid Cloud, Hadoop, Cache Replacement, Cache Prefetching, Bayesian Network
PDF Full Text Request
Related items