Font Size: a A A

Research And Implementation Of Cloud Storage Platform Based On Hadoop

Posted on:2014-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2268330401965791Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, cloud computing has increasingly become the focus of attention athome and abroad. When the major task of computing and processing in cloudcomputing system is data storage,the cloud computing system turns into cloud storagesystem. The rapid development of cloud computing make cloud storage also become theindustry’s most popular study field. Cloud storage as a new service, it stores users’ datain the cloud. Users can access their own data at any time anywhere by logining thecloud storage service system through the Internet, and do not have to worry about theirdata will be lost.Hadoop is an open source distributed computing platform developed by Apache,demonstrated excellent performance in aspects of distributed computing and datastorage, and attracted the attention of well-known IT companies. Many companies andresearch institutions have invested in research on Hadoop, make it be used more andmore widely in cloud computing and cloud storage. HDFS is the Hadoop DistributedFile System, it has a powerful data storage capacity, is very suitable for cloud storagesystem. But there are some flaws in the design, the performance of HDFS is not perfect,in order to promote the use on a large scale, must improve it firstly.This dissertation mainly researchs the cloud storage model of HDFS. ImproveHDFS on the two issues of small file storage not ideal and replica distribution uneven,and build cloud storage platform using the improved HDFS. The main work is follows:1. HDFS used replication mechanisms and stored replicas in the cluster to ensurethe reliability of the data storage. Replicas were stored in different DataNodes in theform of data blocks. However, The HDFS default replica distribution strategy hasrandomness, can not guarantee the replicas evenly distributed in the cluster. To solvethis problem, this dissertation presents an algorithm to select DataNode that mostly nearto the optimal solution and mostly far from the worst solution based on the weightedevaluation matrix. Calculated weight value using the AHP algorithm, while taking intoaccount the load of node, focused on space utilization. Select the most appropriateDataNode to store replicas, make the DataNodes space utilization balanced. 2. HDFS is designed for large files, not suitable for the storage of a large numberof small files. In the case of the same amount of data, the small files waste NameNodememory and reduce access efficiency. To solve this problem, this dissertation willimprove HDFS file stored procedure. Before uploading files to the HDFS cluster, judgethe files whether are small or not. If are small files, that need to be merged, and theindex information of small files are stored in the form of key-value pairs in the indexfile. This optimizational program will reduce NameNode memory consumption andimprove access efficiency while storing a large number of small files.3. Carry out massive experiments, comparing original HDFS and optimizationalprogram. The experiment results show that the optimizational program proposed in thisdissertation has better effect, improved the performance of HDFS. Build cloud storageplatform using the improved HDFS, develop Web applications, simulate cloud storageplatform with B/S model, implement fundamental function of cloud storage.
Keywords/Search Tags:cloud storage, Hadoop, HDFS, replica distribution, small files
PDF Full Text Request
Related items