Research On Optimization Technology Of Distributed File System Based On Hadoop

Posted on:2014-04-01

Degree:Master

Type:Thesis

Country:China

Candidate:D Z Zhang

Full Text:PDF

GTID:2268330401976285

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

With the development of mobile Internet, the amount of data in the network increaseddramatically, these data after analysis and data mining can be very valuable, these informationcan be used for commercial, scientific research, production and other aspects. If we usetraditional supercomputers to handle the rapid growing massive data, it costs high and wastestoo much energy. Cloud computing as a cheap, efficient and reliable solution, get a lot of thepeople’s attention. Hadoop is an open source cloud data processing platform, it can be widelyused in the processing and analysis of huge amounts of data.Cloud platform use thedistributed file system, there are some well-known distributed file systems like Lustre, GPFS(General Parallel File System), the design of these systems are based on the mainframe, theyare not suit for our microcomputer using cloud computing environment today.This paper use GlusterFS as a cloud platform distributed file system, GlusterFS is amicrocomputer useable Distributed File System. This paper firstly realize the connectionbetween GlusterFS and Hadoop core module, the Common, on this point the paper use theGlusterFS’s Translator mechanism. The mechanism is able to achieve all GlusterFSexpansion. The paper use Translator’s library functions to connect the core of Hadoop, theCommon, and this paper obtain the appropriate storage rights and define theorg.apache.hadoop.fs.glusterfs class, and create the data flow which accord GlusterFS dataformat. The paper use FUSE(Filesystem in Userspace) to make GlusterFS mount to Hadoop,and replace the Hadop own Distributed File System HDFS(Hadoop Distributed File System).So the paper can avoid the defects of HDFS, and can use GlusterFS’s advantages to enhancethe the whole Hadoop cloud computing performance. To achieve optimization platform, thepaper use Infiniband RDMA(Remote Direct Memory Access) transmission network, thisnetwork can guarantee that Hadoop can not be affected by the restrictions of networkbandwidth and speed, and improve the performance of the Hadoop; According to networkcongestion situation in the system, the paper use a judgment function to decide whether to usedata compression to save network bandwidth, and further enhance the Hadoop data transferrate in the current network; For the current GlusterFS’s data caching algorithms considers notvery comprehensive, the paper use a new data caching algorithm GAC (GlusterFS AutomaticCache Algorithm). The algorithm first determines whether the current data is in order，and onthe ordered data the paper determine the strength of the order, and the paper use a read-aheadsize formula to calculate the reasonable the size of the read-ahead size. Reasonablepre-reading enhances Hadoop’s filesystem performance. The presented optimization measures,greatly improved the performance of the Hadoop platform distributed file system. Throughtesting on the Hadoop cloud platform, the paper prove that the o ptimized Hadoop Distributed File System performance increases by10times, Hadoop platform cloud computingperformance increases by more than2times.

Keywords/Search Tags:

GlusterFS, Hadoop, GtoH Interface, GAC Algorithm, Data Compression

PDF Full Text Request

Related items

1	The Research And Implementation Of Big Data Cloud System Based On Hadoop
2	Research And Implementation Of Compression For Structured Data On Hadoop Platform
3	GlusterFS Data Distribution Policy And Performance Optimization Research
4	Design And Implementation Of Hierarchical Cloud Storage System Based On GlusterFS
5	Design And Implementation Of Industrial Equipment Maintaince Platform Based On Hadoop
6	Cloud Storage System For Massive Data And Applied Research
7	Massive Data Compression Algorithm In Parallel
8	Network Virtualization And Directory Virtualization Based On GlusterFS
9	Research And Implementation Of Interface Transplant Technology In Hadoop Resource Management Module
10	Arbitrary Waveform Generator Waveform Input Interface Device And Algorithm Design And Implementation