Font Size: a A A

The Implementation And Optimization Of Cloud Storage System Based On HDFS

Posted on:2017-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZouFull Text:PDF
GTID:2308330485453755Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the amount of data is growing exponentially. Storage and analysis of massive data have become very hot research field. Hadoop Distributed File System is a scalable distributed file system which can run on inexpensive hardware and has reliable fault tolerance. It is being more and more popular with all kinds of enterprises and research institutions. At present, the development of cloud storage system based on HDFS to solve large-scale data storage research is also increasing. Our study is also based on HDFS, designing "Hefei City Cloud" storage system.Nevertheless, HDFS itself has some shortcomings, such as large number of small files will cause great pressure on memory and limit the number of storing files and read-write efficiency. The single point of failure problem of Namenode will also impact on HDFS’high availability. To solve this problem, we have done a lot of investigation and research and proposed effective solutions. The main work is as follows:1. For small files storing problem, we propose ABFM optimization. Based on the correlation between the documents, we set the priority. Small files will be merged first according to their priorities and then be uploaded. Meanwhile, an index record will be generated. Random thought is added. We propose two-stage caching strategy, cashing pre-fetch data in memory pool to improve access efficiency. The system will periodically review users’access logs and dynamically adjust the size of the pre-fetch factor. We have done comparative experiments with default HDFS and HAR method. The results show that ABFM policy helps improve small file access efficiency, reduce memory overhead of Namenode and Datanode.2. For the single point of failure problem of Namenode, we have researched on several solutions. We compared their advantages and disadvantages and chose QJM-based solution at last. We gave a systematic analysis of the structure and described the process of building the system in detail. We also displayed the final success page after it was done.3. Combined with the actual business needs, we designed and implemented "Hefei City Cloud" storage system based on HDFS. The system has three modules, which are B/S module added ABFM optimization, NFS module and administrator management system. Users can access the cloud storage system with B/S module and NFS module. System administrators can handle secure authentication, quota management, freezing and unfreezing, expansion requests, etc. It has effectively compensated for the lack of HDFS in these areas. Reality shows that "Hefei City Cloud" storage system can effectively solve enterprise’s data storage and management requirements. It’s running stably and used very conveniently.4. By adopting the method of black box testing, we did function test and performance test of "Hefei City Cloud" storage system. We ensure that the system can meet the design requirements. We also gave the analysis of the performance differences of file uploading and downloading between B/S module and NFS module, which verifies the effectiveness of our optimization strategy.In conclusion, we proposed ABFM optimization scheme, which effectively solves the problem of HDFS that storaging mass small file. Choose QJM-based Namenode high availability solution to ensure the high availability of HDFS. Based on HDFS, we design and implement "Hefei City Cloud" storage system to provide Keda Guozhen (city cloud data center) with data storage and management services.
Keywords/Search Tags:cloud storage, HDFS, small file storage, high availability, B/S, NFS
PDF Full Text Request
Related items