Font Size: a A A

Improvement And Application Of PVFS In Hadoop

Posted on:2016-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:N Y BaoFull Text:PDF
GTID:2348330461958522Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowadays,while data size in scientific computing keeps increasing,HPC is becoming a valued field.So the requirement of distributed computing framework grows every day.In the context of HPC,we need not only a high computing speed,but also an efficient distributed file system working for the framework.It has become a hot issue that how to improve a distributed computing framework with a distributed file system.In our research,we conducted an investigation of mainstream distributed file systems,and decided to use PVFS to improve Hadoop.Then we improved PVFS in some aspects.The main work of this paper can be summarized as following:An analysis of the characteristics of the mainstream distributed file systems has been made.After that,we chose PVFS to be our distributed file system which we have to be the new storage module in Hadoop instead of HDFS.We made a summary of the structure of HDFS as well as its deficiencies as the storage module in Hadoop.Then we made a deep study of the structure and principle of PVFS,so that we can do our following jobs.The application of PVFS in Hadoop has been achieved.We made a study of the requirement of the Hadoop storage module,and connected PVFS with Hadoop.We defined a set of modules to work between PVFS and Hadoop,so that we could maintain the logic stripes in Hadoop and the parallel performance of PVFS.Additionally,our Hadoop with PVFS also can be configured to use other distributed file systems according to the users' will.The data redundancy of PVFS has been achieved.PVFS had no data redundancy in the past.In our work,we made some change on the data storage structure and state machines of PVFS,then we improved the fault tolerance of PVFS.In our system,backup is achieved after the write operation,so the user does not have to wait.Furthermore,we improved the read performance with backups when servers in PVFS have different loads.
Keywords/Search Tags:distributed file system, Hadoop, PVFS, parallel file system, fault tolerant
PDF Full Text Request
Related items