Font Size: a A A

Design And Implementation Of Disk Cache System Based On HDFS Optical Jukebox

Posted on:2020-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z X WangFull Text:PDF
GTID:2428330590494013Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the arrival of big data era,the total daily amount of data generated worldwide reaches to PB level accompanying with an increasing amount of total data.However,only a few data is frequently used and most will not be accessed for a long time.If all data are stored in the disk array,it will bring high storage cost and data management cost.As development of optical disk technology expands,it has been applied to various cold data backup systems,in virtue of its low storage cost,large capacity,high security and low energy consumption etc.At present,a Hadoop distributed file system(HDFS optical jukebox)based on optical jukebox appears in the market.Compared with traditional optical jukebox,HDFS optical jukebox has been greatly improved in system capacity and data transmission speed,but there still are a huge gap between HDFS and disk.Therefore,this paper aims to study how to solve the gap between HDFS optical jukebox and disk storage devices.At first,the structure characteristics of HDFS optical jukebox system and the optimization scheme of small file storage in HDFS file system are studied.In response to the problem of low correlation between merged small files in HDFS optical jukebox,this paper comes up with a label classification algorithm based on file names and designs small files merge strategy according to files label information in the virtual storage module.Then,this paper studies the research status of cache replacement algorithm and prefetching technology at home and abroad.Combined with file label information and scheduling objects in the system,a file label-based LB-LRU(Label Based Least Current Used)algorithm is proposed to boost the cache hit rate of disk cache system,and file prefetching strategy is set in cache module to further raise the cache hit rate.Finally,the paper compares the test performance of traditional HDFS disk with HDFS disk which added to disk buffer system in file reading,writing and memory consumption of NameNode,and tests the performance of label classification algorithm and LB-LRU algorithm in disk buffer system.To sum up,according to the experimental results,the disk buffer system can effectively improve the file reading and writing ability and reduce the memory consumption of HDFS optical jukebox.
Keywords/Search Tags:Small Files, Label, File Prefetch, Cache Replacement Algorithms, Correlation, Optical Jukebox
PDF Full Text Request
Related items