Font Size: a A A

Research And Application Of Data Storage Method Based On Hadoop

Posted on:2019-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:H Y HuangFull Text:PDF
GTID:2428330596463188Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology,we are facing the explosion of data and information.It's becoming a new problem on how to store and analyze vast amount of data,and speed up the data access.Cloud-computing is the answer.Cloud-computing is an Internet-based computing model,under which thousands or millions of computers and servers get connected in a remote data center to provide computing and storage services.Hadoop framework is an optional way to realize cloud computing.Hadoop is an open-source distributed infrastructure,with its file system called HDFS(the Hadoop the Distributed File System).HDFS is a distributed file system which provides reliable data storage,streaming data access and supporting for large data sets.HDFS is able to run on common low-cost hardware,which means that a hardware error will happen often.HDFS takes the strategy named redundant storage to guarantee the reliability of data storage.The redundant storage is a strategy that takes multiple copies of data blocks.It was the key of the performance of a distributed file system,and thus there is much room for optimization.This paper will study the architecture and the storage policy of HDFS,and propose a solution for optimization.It includes the following aspects:(1)Build a Hadoop-based cloud-computing platform with several computers in the lab.(2)Research on the architecture and the storage policy of HDFS.Hadoop is an open source project,researchers could analyze the architecture base on the source code.(3)Propose a new storage strategy.After thoroughly studying the storage strategy of HDFS itself,a new storage strategy is proposed: multi-dimensional constraint strategy,that is to say,for the selection of a node,the CPU performance,memory remaining rate and network bandwidth information of the node are added to improve the performance of the system and achieve the purpose of optimizing the system.(4)the new storage strategy should be realized in the actual source code,and take experiments to test if the new strategy could be used.Distributed data replication technology is an important part of distributed computing.The technology allows data to be shared across multiple servers,one local server can access data on remote servers in different physical locations,and all servers can hold copies of the data.
Keywords/Search Tags:Could-computing, Redundant Storage, Hadoop, Copy storage, HDFS, Multidimensional constraint
PDF Full Text Request
Related items