Research And Implementation Of Small File Storage Model Based On HDFS

Posted on:2015-02-01

Degree:Master

Type:Thesis

Country:China

Candidate:N Li

Full Text:PDF

GTID:2308330482955990

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Big Data is one of the hot issues in the field of computer research currently. Enterprise analysis of consumer behavior, department sales data and other indicators through big data, can gain the intelligence data which plays an important role in increasing the market competitiveness of enterprises. With the big data positive impact on enterprise development, more and more companies will take advantage of big data, and to seek for big data storage solutions. HDFS is an open source distributed file system, because of its low cost, good reliability, etc, and to become the preferred solution for mass data storage in companies and research institutes. HDFS is suitable for using streaming data access and store large files. It will show insufficient storage capacity, low reading and writing efficiency, when storing mass small files which produced by big data research. How to make HDFS storing and accessing small files efficiently, is a very important research direction.This thesis analyses of the causes of the problem of small files from disk, network communication, metadata and other factors of network application, and then presents a small file storage model contains master node and worker nodes to optimize the small file problems.The main work is taken as the follows in this thesis:(1) Small file storage model architecture. The model is an application based on HDFS. It handles the read and write request from client instead of HDFS, to optimize the small file problems through the small file optimization strategy.(2) Merging store. Multiple small files are saved in a file in HDFS. This method can effectively reduce the number of HDFS in the system metadata, and reduce the disk seek time when HDFS read small file data.(3) Index management. Index management includes the creation, serialization and deserialization. Each worker node stores the associated small file index information in memory. Worker node locates the offset address of small file data in a HDFS file by querying the index table.(4) Cache management. We build the cache on the memory and disk. Cache stores recently written and frequently accessed small files. Building-in variety of commonly used buffer replacement algorithm, effectively reducing the number of worker nodes interacting with HDFS, and improving the efficiency of client access to small files.(5) Through the deployment of multi-node test platform, testing the read-write performance on the small file storage model. Test results show that Small files storage model plan is feasible and showed good small file read and write performance.

Keywords/Search Tags:

HDFS, small files, distributed, merging storage, metadata, cache

PDF Full Text Request

Related items

1	The Research Of HDFS Optimization Towards Lots Of Small Files Accessing And Storage
2	Research And Implementation Of Mass Small File Storage System Based On HDFS
3	Research On Efficient Storage Of Small Files In Mobile Ultrasound Detection Based On HDFS
4	Research And Optimization Of The Distributed Storage On HDFS
5	Research And Design Of Massive Small Files Merging Based On Hadoop
6	Design And Implementation Of A Distributed Storage Of Small Files Performance Optimization Strategies
7	Design And Implementation Of Distributed Storage Middleware For Small Files
8	Research Of Small Files Storage Method Based On HDFS
9	Design And Implementation Of Independent Metadata In Cloud Storage
10	Research Of Improving Storage Of Replica And Small Files Merging And Access Optimization On Hadoop Platform