Font Size: a A A

Research Of Distributed Data Storage System Based On Hadoop

Posted on:2019-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:L M ZouFull Text:PDF
GTID:2428330542495105Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and the universal popularity of the Internet technology and information technology,the world has entered the Internet + era from the Internet era.All walks of life,each Portal websites and e-commerce websites generate a large amount of data every day,and the amount of data shows a blowout growth.For the storage of massive data,the cost of vertical expansion has been increasing.This has become more and more burdensome for companies that use commercial storage,and has even become a key issue that restricts the development of many enterprises.To solve this problem,it is becoming more and more important to design and implement high-capacity,high-concurrency big data storage systems.The three main issues that need to be addressed when facing big data are storage problems,analysis problems,and management problems.Storage is the premise of data operation,so solving the data storage problem is the top priority.A distributed data storage system based on Hadoop is designed and implemented in this paper.Using Hadoop as a distributed framework,a cluster of common machines is formed by this framework,and a distributed data storage system is realized by the storage space of the whole cluster.Traditional storage systems are mainly centralized storage,which stores data uniformly on one machine or one server.This storage method has many problems.In the event of a machine failure,the integrity of the data is not guaranteed.Therefore,this paper proposes a distributed strategy to store data and ensure the security,reliability and integrity of data by redundant manipulation of the data.First of all,this paper analyzes and introduces the application of distributed theory and distributed storage system as well as the key technologies.Based on the distributed idea,the distributed data storage system based on Hadoop is designed and implemented in this paper,and the cluster of Linux system is deployed under the framework of Hadoop.On the basis of testing the feasibility of the cluster,the file data access functions are implemented.Then the system is optimized based on actual problems.Finally,the system performance is tested.Through the test of the system and the comparison with the traditional storage mode,it is proved that the distributed data storage system proposed in this paper can store a large number of data,and can ensure the integrity and reliability of the data.Through the optimization of the system,there is also a huge increase in performance.By optimizing the system,the performance is also greatly improved.
Keywords/Search Tags:Big data, Hadoop, HDFS, Distributed storage
PDF Full Text Request
Related items