Font Size: a A A

Memory-based Data Storing Technologies On Hadoop Distribution File System

Posted on:2016-02-29Degree:MasterType:Thesis
Country:ChinaCandidate:X J QianFull Text:PDF
GTID:2308330503977202Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of the Internet, the size of the data are from all works of life can easily achieve GB, TB even PB. Cloud computing data processing system Hadoop arises at the historic moment. Hadoop provides extensible and flexible computing environment forwidespread users and all kinds of big data processing applications.Hadoop data processing platform can support large data storage with the underlying distributed file system.The file system can only support a singlestorage medium of, however, the intermediate data in the ordered workflows requires frequent disk read and write. Thegrowingl/O cost influence the efficiency of the whole process of data processing.To solve the problem of low throughput and long data access time delay, weresearchdata storage technologyin the Hadoop distributed file system based on memory.This thesis analyzed thearchitecture oftheHadoop distributed file system and data storage process in the system.we assignreasonable size memory resource on each data node, and assign a higher priority to the memory storage space. In order to adapt the change on every data nodes, the policy of block palacement should be change in the same time. We design a cost-based replica placement strategy to consider the free space of the memory storage in every data node.At the end, we put forward the reasonable memory data exchange methods with file-heat calculating and updating algorithm.We keep maintaining part of free memory storage in the cluster to provide users with high throughput of data reading and writing service all the time.This thesis designed and developed the memory-based Hadoop distributed file system based on theHDFS. By analyzingthe work efficiency of the memory-based and disk-based HDFS performance, the results show that thememory-based HDFS can effectively shorten data access delay, improve the system of data throughput, with significant performance advantages.
Keywords/Search Tags:HDFS, Memory storage, replica placement strategy, File replacement
PDF Full Text Request
Related items