Font Size: a A A

Research Of Data Storage And Management On Huatu Online Library System Based On HDFS

Posted on:2014-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:C YangFull Text:PDF
GTID:2268330425471032Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a platform for users to share information, the online library system brings users to efficiency and convenience, However, with the growth of the amount of data, the increase of user usage, the forms and types of resources of library system are augment and varied, exponential growth of mass data resources brought problems to the storage system, and how efficient storage and management of these data become a pressing problem.It is possible to efficiently store and manage these huge amounts of data since the emergence of cloud storage technology. In this article, the cloud computing platform Hadoop, which is very popular currently, was selected as data storage and management platform of the online library system. We use the Hadoop distributed file system (HDFS) to store and manage the document files of online library system. Since HDFS just to solve common challenges of data storage and management, It cann’t be applied in online library system easily, so it must be improved. Documents of online library system is generally the type of word, pdf, txt and the like, these types of files are relatively small, more than90%of the size of these documents range from32KB to20MB. The metadata of every file is stored in internal memory of the metadata management node (NameNode) in HDFS, when it is used to store the vast amounts of small files, it can lead to excessive memory consumption in NameNode, that is to say, HDFS cann’t store any files when the NameNode’s memory is used up. So in this thesis we propose an optimized solution about mergeing small files into a large file, which can effectively reduce the memory loss of NameNode. On the other hand, considering the speed wreck we put forward a data prefetching mechanism, this mechanism includes two levels of cache, through the two levels of cache, we can significantly improve user file reading fluency, and relieve the pressure on NameNode in HDFS.
Keywords/Search Tags:Cloud Storage, Mass Storage, Hadoop, HDFS, File System
PDF Full Text Request
Related items