Research And Application Of Data Storage Method Based On Hadoop

Posted on:2019-09-25

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Huang

Full Text:PDF

GTID:2428330596463188

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of science and technology,we are facing the explosion of data and information.It's becoming a new problem on how to store and analyze vast amount of data,and speed up the data access.Cloud-computing is the answer.Cloud-computing is an Internet-based computing model,under which thousands or millions of computers and servers get connected in a remote data center to provide computing and storage services.Hadoop framework is an optional way to realize cloud computing.Hadoop is an open-source distributed infrastructure,with its file system called HDFS(the Hadoop the Distributed File System).HDFS is a distributed file system which provides reliable data storage,streaming data access and supporting for large data sets.HDFS is able to run on common low-cost hardware,which means that a hardware error will happen often.HDFS takes the strategy named redundant storage to guarantee the reliability of data storage.The redundant storage is a strategy that takes multiple copies of data blocks.It was the key of the performance of a distributed file system,and thus there is much room for optimization.This paper will study the architecture and the storage policy of HDFS,and propose a solution for optimization.It includes the following aspects:(1)Build a Hadoop-based cloud-computing platform with several computers in the lab.(2)Research on the architecture and the storage policy of HDFS.Hadoop is an open source project,researchers could analyze the architecture base on the source code.(3)Propose a new storage strategy.After thoroughly studying the storage strategy of HDFS itself,a new storage strategy is proposed: multi-dimensional constraint strategy,that is to say,for the selection of a node,the CPU performance,memory remaining rate and network bandwidth information of the node are added to improve the performance of the system and achieve the purpose of optimizing the system.(4)the new storage strategy should be realized in the actual source code,and take experiments to test if the new strategy could be used.Distributed data replication technology is an important part of distributed computing.The technology allows data to be shared across multiple servers,one local server can access data on remote servers in different physical locations,and all servers can hold copies of the data.

Keywords/Search Tags:

Could-computing, Redundant Storage, Hadoop, Copy storage, HDFS, Multidimensional constraint

PDF Full Text Request

Related items

1	HDFS To Copy Data Storage Optimization And The Study Of Mass Data Storage
2	Research On Storage Strategies And Optimization Hadoop Platform
3	Research Of Improving Storage Of Replica And Small Files Merging And Access Optimization On Hadoop Platform
4	Research And Application Of Distributed Storage System Based On Cloud Computing
5	Research Of Data Storage And Management On Huatu Online Library System Based On HDFS
6	The Technical Research Of Optimization Of File Storage In HDFS
7	Application And Research On Data Storage Of Rail Transit Maintenance Support System Based On Hadoop
8	Design And Implementation Of Cloud Security Storage System Based On Hadoop
9	Research Of Cloud Computing Based On Data Storage Technology
10	Research And Design On Hadoop-based Cloud Storage Platform Of New Campus