Research And Improvement Of Data Check Strategy In Distributed File System

Posted on:2014-01-05

Degree:Master

Type:Thesis

Country:China

Candidate:R Jing

Full Text:PDF

GTID:2248330395997457

Subject:Network and information security

Abstract/Summary:

PDF Full Text Request

This paper based on the Project of National Natural Science Foundation of China, carriedout research and analysis on cloud computing storage platform. The author in laboratory willfocus on Distributed File System as a research platform, DFS is currently very widely used todeal with large data, such as HDFS, GPFS, LustreFS.Distributed file system is responsible for the distribution of data storage, datamanagement and provides high throughput of data access performance. DFS maintainsfunctions in addition to read and write operation, but also has data check function, thisfunction will be applied to read and write data. It is well guaranteed data integrity. Readingand writing data from block may be appeared damage situation due to storage equipment,network and software defects reasons. For large data processing, the original complex heavycomputing tasks and data validation process can bring extra burden to distributed file systems,reading and writing rate will also slow down, then it would need to build a complete system,under the condition of data integrity as far as possible to reduce the effect of data validationbrings to the system.Lustre has two checksum modes: one kind is memory mode (when data in the clientcache), the other is a line mode (data transmission in the network line), to ensure data integrity.GPFS through its own disks, network Shared disk (NSD) and the GPFS File equipment threelayer architecture mechanisms to ensure data integrity, utilizes three usability judgmentmechanisms: File System Descriptor Quorum, Node Quorum, the Tiebreaker Quorum to fullyguarantee the data integrity and System safety. HDFS is the core part of Hadoop,MapReduceframework is in storage layer, also like Lustre, GFS etc system as an independent distributedfile system exists. HDFS uses the CheckSum and DataBlockScanner ways at the same time toensure that the data stored in the data on the node completely. HDFS DataNode in the localfile system to store data block metadata for CRC check. For each block, to request DataNodechecksum information, returns back information including block all the MD5checksum, ifattempts to a DataNode failed message, to another DataNode, finally puts all the pieces ofMD5together, and calculates the content of the MD5. The main work in the following aspects:(1) Describe the background of this study, leads to the concept of distributed file system dataintegrity check, presentation analyzing the concept and related technologies.(2) Analysis Distributed File System data integrity protection mechanisms, detail the methodof GPFS, Lustre and HDFS data integrity check model. Focus on profiling the HDFS datatransfer process checksum calculation, Hash algorithm and data transmission.(3) According to the front of foreshadowing, create a distributed file system data toverification Model DFS-DICM.(4) According to the model, aim at the data writing process, cache allocation, CRC32checksum algorithm variants optimized to improve the computational efficiency, andenhances system performance.(5) Use HDFS as an experimental platform for the improvement measures, use its own testingsystem benchmark to test the overall and individual performance, load balancing, datatransfer process and CRC32algorithm separately to optimize the impact on the system.(6) Compare and analysis the experimental results, obtain the paper conclusion and outlookfor further work.

Keywords/Search Tags:

Cloud Computing, Distributed File System, HDFS, Data Integrity, CRC

PDF Full Text Request

Related items

1	Research And Design Of A Trusted Distributed File System Based On HDFS
2	Testing And Evaluating Technology Research Of The Distributed File System On The Cloud Platform
3	Research On Cloud Storage File System With PSO Scheduler And Buffer Model For Interaction Intensive Application
4	Research On The Problem Of Data Consistency Of Distributed File System In The Cloud Computing Environment
5	Research On Data Integrity Verification Technology In Cloud Storage
6	Research On Load Balancing Problem In Distributed File System In Cloud Computing Environment
7	Enterprise Archiving System Based On Cloud
8	Design And Implementation Of Secure Cloud Storage System Based On HDFS Small File Processing
9	The Optimization And Implementation Of Cloud Storage System Based On HDFS
10	Research On Key Technology Of Cloud Storage Based On Hdfs