Font Size: a A A

A Centralized Storage And Retrieval System For Electronic Records Based On Hdfs

Posted on:2013-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2298330434475614Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Along with the advancement of the process of government informatization, electronic records have been greatly developed in our country. Relative to the management of paper documents, electronic records management is not mature especially in the storage. Electronic records with its own easy to transfer and save features may not be restricted in accordance with geographically dispersed storage. Centralized storage of electronic records can effectively strengthen control of electronic records, improve office efficiency, reduce human resource costs, and resolve file-loss, leakage, and other issues. But how to achieve the centralized storage of mass electronic records directly affects the implementation and efficiency of the entire system. Cloud storage is a model of networked online storage where data is stored in virtualized pools of storage. As long as the hardware is allowed, it can provide almost unlimited cheap storage capacity. Cloud storage technology can be efficient to solve the problem of massive electronic records stored centrally. Based on the Google File System (GFS) design thinking, open-source cloud storage file system Hadoop Distributed File System (HDFS) with its excellent performance and reliability of handling very large files become a hotspot of cloud storage technology. But electronic records in e-government are mainly small files. HDFS plays poor performance in dealing with massive small file storage and access.In this paper, according to HDFS’s the poor performance in dealing with small files, we propose a strategy, by using storage cache and read cache, to increase massive small files storage ind access efficiency. The basic idea is to design and implement a middleware of HDFS to reduce iccess times of HDFS while meeting the storage access requirement. Thus we can improve the storage and access efficiency. The basic idea of storage cache strategy is to set multiple caches and choose the optimal one of multiple caches while storing small files. Doing these can improve the itilization of the storage cache and reduce access times of HDFS. The basic idea of read cache strategy is using the buddy system to manage the whole fixed-size read cache and set the dfficiency threshold for each segmented cache. We use the efficiency threshold to control the:ache update strategy to maximize the cache utilization. So accessing files can use the read cache is many as possible and reduce access times of HDFS. We also propose a security strategy using nulti-level encryption to ensure the confidentiality and privacy of electronic records during the processes of storing and accessing. At last, we implement the prototype system and then test and nalysis it to prove feasibility and availability of our ideas above.
Keywords/Search Tags:Electronic Records, Distributed File System, HDFS
PDF Full Text Request
Related items