Font Size: a A A

Research And Design Of Multi-Tenant Small File Storage System Based On HDFS

Posted on:2017-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:G F HeFull Text:PDF
GTID:2308330482981839Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rapid development of computer technology, especially the Internet technology makes people produce and share information more conveniently. People can produce a large number of short pictures, audio and video content anytime, anywhere with Mobile Apps, such as WeChat. This gives great challenges for storage and processing the massive information which also has been the essential part of Internet Services. Hadoop has become the facto standard for big data processing including a storage system named HDFS (Hadoop Distributed File System). But it’s focused on processing the large file where latency and throughput are extremely high. The metadata and data index access pattern is very unsuitable in the scene with many small files.HDFS must load its whole metadata into the memory during the run-time, as a result its cluster capacity is highly limited by the amount of the server memory. Massive small files will use the metadata bookkeeping records as the large ones, but they occupy much smaller storage space than the later. These decrease the utilization ratio of the metadata and cluster spaces. When accessing files, client usually needs four round-trip of network communications to retrieve the data. The network delay is more significant than the large file, and it affects the efficiency of access. Although HDFS has user-level quota on file and storage, it does not have the multi-tenancy mechanism and can’t control the resource usage in a fine grained level.To solve above problems, this paper proposes to introduce the separated multi-level cache into the HDFS metadata storage management by modifying the metadata storage model. So the memory overuse can be totally avoided. The cache miss ratio is decreased by proposed cache strategies. Meanwhile, we propose a new direct file access pattern, which bypasses around the metadata node and fetches the content from data node. This model also supports fine-grained tenant resource isolation and control of multi-tenant mechanism to the HDFS. The stability of the HDFS is largely improved by the multi-tenancy strategy.
Keywords/Search Tags:HDFS, Multi-Tenancy, Small File Storage
PDF Full Text Request
Related items