Font Size: a A A

Research On Dynamic Data Allocation Strategy Based On HDFS In Heterogeneous Platform

Posted on:2019-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:S L WenFull Text:PDF
GTID:2348330545990035Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information society,the amount of data in the Internet has increased rapidly.Because the traditional storage modes are difficult to break through its storage limit,distributed storage systems are becoming more and more popular,where HDFS(Hadoop Distributed File System)is a widely used distributed file storage System.With the development of advanced storage devices,heterogeneous devices,such as SSD with high read-write performance and common disk,are widely used in HDFS as the current mainstream storage medium.HDFS effectively solves the problem of large data storage which is faced by the large-scale processing,and stores different cold or hot data by providing different data storage policies and corresponding storage interface,so that developers can use these interfaces to realize the classification and storage of data.However,how to accurately allocate the cold data and hot data in HDFS is the most critical problem at present.Through our research and analysis,it is found that in the distribution of data in the HDFS,the allocation algorithm is the first to allocate the storage strategy for all the data,and then dynamically adjust its storage policy according to the frequency of the data,so as to realize the allocation of the cold and hot data.However,using the traditional algorithm to allocate data in HDFS causes some problems:on the one hand,some cold data may be stored on the SSDs,resulting the hit rate of SSDs in the system is not high,and the SSDs cannot give full play to its role,which will result in the waste of SSDs hardware resources and affect the performance of HDFS;on the other hand,some hot data may be stored on the common disks,which will result in low read and write efficiency of these data and reduce the system's throughput and access efficiency,and also impact on the HDFS's performance.In order to solve the above problems existing in the traditional allocation algorithms,this dissertation is based on the research of data(file level)allocation strategy based on HDFS.The main work done in this paper is as follows:(1)Using the hot value to reflect the access popularity of a file.We propose a method of predicting the initial hot value of a file based on the Trace file analysis.The main idea is:first we count and analyze historical Trace of past or current HDFS applications,and we consider the factors such as file type,file size and user who uploads the files;and then we build a model to calculate the initial hot value of different types of files.In this way,when each new file is uploaded for the first time,according to its type,we preset an initial heat value for this new file,so as to achieve the initial allocation of the file's hot value.(2)The traditional methods dynamically adjust the allocation strategy based on the file's access frequency,only using the file access times in a past period of time to predict the access frequency of the file in the future,while ignoring the fact that the performance is mainly due to the file access times in the future,which is limited by some key factors such as file type,file size,and so on.Based on this problem,a method of using BP neural network to adjust the file's hot value is proposed.When the file is downloaded:first we consider some factors such as file type,file size and file number of access times;next we build a model to calculate the real-time hot value of the file,and use BP to adjust the hot value of the file.And then according to the adjusted hot value,we will be able to predict the frequently accessed files and the infrequently accessed files,in order to implement the real-time allocation of the file's hot value.(3)After the file's hot value is allocated,according to the hot value,we will be able to store the files in heterogeneous devices with different read/write performance.That is to say,we will store the frequently accessed files on the SSDs and store infrequently accessed files on the mechanical hard disks.(4)Based on HDFS,we compare the data allocation strategy of this paper with the traditional data allocation strategy,and the experiments show that the proposed data allocation strategy in this paper can improve the performance of HDFS.
Keywords/Search Tags:HDFS, heterogeneous devices, hot value, BP neural network, data allocation strategy
PDF Full Text Request
Related items