Font Size: a A A

Research And Design Of High Performance Distributed File System For Small File

Posted on:2019-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:J C HuFull Text:PDF
GTID:2428330566987224Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the popularity and coverage of the Internet,social networks has become the inseparable part of people's daily life.This has led to a sharp increase in the amount of data and a variety of data types.In recent years,the number of short live video sites and social sites continue to rise.The individual file produced by these websites is relatively small in size,and is usually ranging from tens of kilobytes to several megabytes.However,traditional distributed file systems like GFS and HDFS have been optimized for large file.When faced with a massive of small files,their performance will drop sharply or even fail to provide services.How to design and implement a high-throughput and highly-available small file system is a hot topic currently.FastDFS is an open source high performance distributed file system.It's major functions include: file storing,file syncing and file accessing,and it can resolve the high capacity and load balancing problem.Compared with other distributed file systems,its advantage is lightly.FastDFS should meet the requirement of high concurrent access and easy to expand.In order to optimize the storage of small files,FastDFS aggregates a lot number of small files into one large file,so that can reduces the number of metadata,and then improves file access performance.This article starts with the application scene of audio file storage,after briefly analyzes the characteristics of small audio files.An efficient distributed file system for massive small files is proposed,this system is called EastDFS(Efficient Access of Small Data in Distributed File Systems).The main advantages of EastDFS are that it can speed up file reading and meet the need of write once read many.The basic idea of EastDFS is to modify the file aggregation algorithm,by properly aggregating the same type of files into one large file.After modifying the aggregation algorithm,some metadata was introduced.In order to ensure the high availability of the system,this metadata needs to be persisted.In addition,the original system is also used as a data center to construct a distributed storage system that can support massive storage,avoiding a single machine becoming the system bottleneck.After the file aggregation mode was changed,the problem of random writing is incurred.In order to speed up writing rate,the cache layer was introduced in EastDFS.The file is directly written into the cache,and the background thread will periodically flushes those data into the disk.Cache algorithm is based on LRFU.In this paper,with the unique feature of small audio file,the contribution rate of each actual hit is integrated into the prediction of block`s arrival time,making the algorithm more reasonable.In file downloading,the original single requests are aggregated together,multiple data are downloading at one time.It is because of continuous reading that the performance of EastDFS is greatly improved.
Keywords/Search Tags:Massive small files, LRFU, File aggregation, Distributed File System
PDF Full Text Request
Related items