Font Size: a A A

Research On Optimization Model Of Distributed Storage And Cache

Posted on:2017-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q J LiFull Text:PDF
GTID:2308330503485309Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Implementing storage virtualization using distributed storage can provide good reliability, compatibility and fault tolerance. However, traditional distributed storage cluster is mainly used for mass data analysis and scheduling of computing and storage resources, research and practice do not pay enough attention to implement storage virtualization using distributed storage, the reason of which is when the data move from local hard disk to the network cluster, read-write performance will be influenced. In order to balance system reliability and read-write performance when using distributed storage to achieve storage virtualization, a good caching system is needed.Based on this, the paper presents a distributed multi-level cache model(DMCM) to validate the feasibility of implementing storage virtualization using distributed storage. As HDFS(Hadoop Distributed File System) is a widely used distributed storage framework, the model takes HDFS as the backend storage, memory as first cache, and hard disk as a secondary cache, while configuring iSCSI server node in HDFS’s NameNode to provide external access, so as to achieve storage virtualization in a LAN. In order to improve the cache hit ratio, this paper makes rule for cache replacement, and integrate this algorithm into index table structure design.This paper develops a block device drivers and backend scheduler program as an implementation of the model. When users from iSCSI client upload and download files, it will trigger a request to read or write the block device, and can achieve efficient data scheduling in the cache levels.To implement the distributed multi-level cache model, the paper builds HDFS clusters and iSCSI server and deploys them in the LAN environment, and test read-write performance of the system. Write performance test the effects of cache unit size and write-once data transmission performance, and found that large transmission buffer unit can achieve higher rate, also when a single file exceeds 20 MB the write rate stabilize; As for read performance, the paper test cache hit rate of three modes,random access, local access and sequential access, and find that local access has best cache hit rate and efficiency.DMCM model aims at balancing reliability and read/ write performance when implementing storage virtualization based on HDFS, which can be used as a design reference of implementing storage virtualization using distributed storage.
Keywords/Search Tags:storage virtualization, distributed storage, cache replacement policy, Hadoop Distributed File System, iSCSI server
PDF Full Text Request
Related items