Research Of Distributed Data Storage System Based On Hadoop

Posted on:2019-06-07

Degree:Master

Type:Thesis

Country:China

Candidate:L M Zou

Full Text:PDF

GTID:2428330542495105

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development and the universal popularity of the Internet technology and information technology,the world has entered the Internet + era from the Internet era.All walks of life,each Portal websites and e-commerce websites generate a large amount of data every day,and the amount of data shows a blowout growth.For the storage of massive data,the cost of vertical expansion has been increasing.This has become more and more burdensome for companies that use commercial storage,and has even become a key issue that restricts the development of many enterprises.To solve this problem,it is becoming more and more important to design and implement high-capacity,high-concurrency big data storage systems.The three main issues that need to be addressed when facing big data are storage problems,analysis problems,and management problems.Storage is the premise of data operation,so solving the data storage problem is the top priority.A distributed data storage system based on Hadoop is designed and implemented in this paper.Using Hadoop as a distributed framework,a cluster of common machines is formed by this framework,and a distributed data storage system is realized by the storage space of the whole cluster.Traditional storage systems are mainly centralized storage,which stores data uniformly on one machine or one server.This storage method has many problems.In the event of a machine failure,the integrity of the data is not guaranteed.Therefore,this paper proposes a distributed strategy to store data and ensure the security,reliability and integrity of data by redundant manipulation of the data.First of all,this paper analyzes and introduces the application of distributed theory and distributed storage system as well as the key technologies.Based on the distributed idea,the distributed data storage system based on Hadoop is designed and implemented in this paper,and the cluster of Linux system is deployed under the framework of Hadoop.On the basis of testing the feasibility of the cluster,the file data access functions are implemented.Then the system is optimized based on actual problems.Finally,the system performance is tested.Through the test of the system and the comparison with the traditional storage mode,it is proved that the distributed data storage system proposed in this paper can store a large number of data,and can ensure the integrity and reliability of the data.Through the optimization of the system,there is also a huge increase in performance.By optimizing the system,the performance is also greatly improved.

Keywords/Search Tags:

Big data, Hadoop, HDFS, Distributed storage

PDF Full Text Request

Related items

1	Research Of Distributed Data Storage System Based On Hadoop
2	The Technical Research Of Optimization Of File Storage In HDFS
3	Research On Data Storage Method Based On HDFS And Implementation In Building Big Data Platform Of Industry
4	Application And Research On Data Storage Of Rail Transit Maintenance Support System Based On Hadoop
5	Research And Application Of Data Storage Method Based On Hadoop
6	HDFS To Copy Data Storage Optimization And The Study Of Mass Data Storage
7	Research Of Data Storage And Management On Huatu Online Library System Based On HDFS
8	Implemention Of The Massive Telecom Data Distributed Storage And Query System Based On Hadoop
9	Research On Storage Strategy Of Distributed File System HDFS
10	Research And Application Of Distributed Storage System Based On Cloud Computing