Font Size: a A A

Research And Implementation Of Small File Storage Model Based On HDFS

Posted on:2015-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:N LiFull Text:PDF
GTID:2308330482955990Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Big Data is one of the hot issues in the field of computer research currently. Enterprise analysis of consumer behavior, department sales data and other indicators through big data, can gain the intelligence data which plays an important role in increasing the market competitiveness of enterprises. With the big data positive impact on enterprise development, more and more companies will take advantage of big data, and to seek for big data storage solutions. HDFS is an open source distributed file system, because of its low cost, good reliability, etc, and to become the preferred solution for mass data storage in companies and research institutes. HDFS is suitable for using streaming data access and store large files. It will show insufficient storage capacity, low reading and writing efficiency, when storing mass small files which produced by big data research. How to make HDFS storing and accessing small files efficiently, is a very important research direction.This thesis analyses of the causes of the problem of small files from disk, network communication, metadata and other factors of network application, and then presents a small file storage model contains master node and worker nodes to optimize the small file problems.The main work is taken as the follows in this thesis:(1) Small file storage model architecture. The model is an application based on HDFS. It handles the read and write request from client instead of HDFS, to optimize the small file problems through the small file optimization strategy.(2) Merging store. Multiple small files are saved in a file in HDFS. This method can effectively reduce the number of HDFS in the system metadata, and reduce the disk seek time when HDFS read small file data.(3) Index management. Index management includes the creation, serialization and deserialization. Each worker node stores the associated small file index information in memory. Worker node locates the offset address of small file data in a HDFS file by querying the index table.(4) Cache management. We build the cache on the memory and disk. Cache stores recently written and frequently accessed small files. Building-in variety of commonly used buffer replacement algorithm, effectively reducing the number of worker nodes interacting with HDFS, and improving the efficiency of client access to small files.(5) Through the deployment of multi-node test platform, testing the read-write performance on the small file storage model. Test results show that Small files storage model plan is feasible and showed good small file read and write performance.
Keywords/Search Tags:HDFS, small files, distributed, merging storage, metadata, cache
PDF Full Text Request
Related items