Research And Implementation Of Cloud Storage Platform Based On Hadoop

Posted on:2014-08-19

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhang

Full Text:PDF

GTID:2268330401965791

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, cloud computing has increasingly become the focus of attention athome and abroad. When the major task of computing and processing in cloudcomputing system is data storage,the cloud computing system turns into cloud storagesystem. The rapid development of cloud computing make cloud storage also become theindustry’s most popular study field. Cloud storage as a new service, it stores users’ datain the cloud. Users can access their own data at any time anywhere by logining thecloud storage service system through the Internet, and do not have to worry about theirdata will be lost.Hadoop is an open source distributed computing platform developed by Apache,demonstrated excellent performance in aspects of distributed computing and datastorage, and attracted the attention of well-known IT companies. Many companies andresearch institutions have invested in research on Hadoop, make it be used more andmore widely in cloud computing and cloud storage. HDFS is the Hadoop DistributedFile System, it has a powerful data storage capacity, is very suitable for cloud storagesystem. But there are some flaws in the design, the performance of HDFS is not perfect,in order to promote the use on a large scale, must improve it firstly.This dissertation mainly researchs the cloud storage model of HDFS. ImproveHDFS on the two issues of small file storage not ideal and replica distribution uneven,and build cloud storage platform using the improved HDFS. The main work is follows:1. HDFS used replication mechanisms and stored replicas in the cluster to ensurethe reliability of the data storage. Replicas were stored in different DataNodes in theform of data blocks. However, The HDFS default replica distribution strategy hasrandomness, can not guarantee the replicas evenly distributed in the cluster. To solvethis problem, this dissertation presents an algorithm to select DataNode that mostly nearto the optimal solution and mostly far from the worst solution based on the weightedevaluation matrix. Calculated weight value using the AHP algorithm, while taking intoaccount the load of node, focused on space utilization. Select the most appropriateDataNode to store replicas, make the DataNodes space utilization balanced. 2. HDFS is designed for large files, not suitable for the storage of a large numberof small files. In the case of the same amount of data, the small files waste NameNodememory and reduce access efficiency. To solve this problem, this dissertation willimprove HDFS file stored procedure. Before uploading files to the HDFS cluster, judgethe files whether are small or not. If are small files, that need to be merged, and theindex information of small files are stored in the form of key-value pairs in the indexfile. This optimizational program will reduce NameNode memory consumption andimprove access efficiency while storing a large number of small files.3. Carry out massive experiments, comparing original HDFS and optimizationalprogram. The experiment results show that the optimizational program proposed in thisdissertation has better effect, improved the performance of HDFS. Build cloud storageplatform using the improved HDFS, develop Web applications, simulate cloud storageplatform with B/S model, implement fundamental function of cloud storage.

Keywords/Search Tags:

cloud storage, Hadoop, HDFS, replica distribution, small files

PDF Full Text Request

Related items

1	Research Of Improving Storage Of Replica And Small Files Merging And Access Optimization On Hadoop Platform
2	Design And Implementation Of Cloud Storage System Based On Hadoop
3	The Research And Implementation Of Storing Massive Small Files In Cloud Storage
4	Research And Implementation Of Small Files Storage Management Based On Hadoop
5	Research And Application Of Cloud Storage Technology In Network Security Equipment Linkage System
6	Research And Optimization Of The Distributed Storage On HDFS
7	Research On Hadoop/Mapreduce-based Scalable Storage System
8	Research And Optimization Of Storage Performance Of Massive Small Files In Cloud Environment
9	Research And Optimization Of Storage Mechanism In Hadoop Distributed File System
10	Research And Realization Of Small Cloud Storage System Based On HDFS