Font Size: a A A

Research And Optimization Of Io Performance Based On Eventually Consistence In HDFS

Posted on:2017-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:J J HeFull Text:PDF
GTID:2428330590968463Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,cloud computing and big data become more and more popular.The standalone mode is no longer able to meet the growing needs of users,and more and more individuals and companies are moving to distributed platform.Hadoop with its high reliability,high scalability,high performance,fault-tolerant,low-cost characteristics becomes one of the most popular distributed system infrastructures,which has been accepted and widely used in different production environments.HDFS as the storage implementation of Hadoop servers Hadoop efficiently,and is also used for other distributed systems such as Spark.HDFS is a short name for Hadoop Distributed File System as a subproject of Hadoop,which is designed to run on generic hardware distributed file system.There are many similarities among HDFS and other existing distributed file systems.And the difference is very obvious.Because of its high fault tolerance,it can be widely used in low-cost hardware.HDFS provides high throughput access to application,especially for application with a large data set.However,when HDFS is used for more environments,the more requirements are gradually proposed.For example,users hope that HDFS supports features like low latency and high performance,so that they can read the file which is in writing as soon as possible for further analysis.In order to achieve low-latency and high-performance distributed file system,we study the workflows of HDFS at first.Based on researches of performance optimization proposed at home and abroad,we analyze the advantages and disadvantages for each one and propose a useful solution to the question.The solution combines writing strategy with theory of eventually consistence to realize a distributed file system with low latency and high performance.In the solution,we first break the strong consistency of writing strategy in HDFS and reconstruct it with eventually consistence instead,which makes users be able to access the data before the writing operation finishes completely.Secondly according to the new writing strategy,we propose a new reading strategy to access data under construction,which can decrease the latency and improve the performance obviously.Then we discuss new errors and exceptions from the changes of writing and reading policies and how to handle it in our new file system.At last,we conduct the experiments in our environment.The results prove that the new solution based on eventual consistency theory decreases the latency significantly and outperforms original HDFS by 163 % in performance.
Keywords/Search Tags:HDFS, performance optimization, high availability, writing policy, reading policy
PDF Full Text Request
Related items