Font Size: a A A

Research On Distributed Storage System Of Agricultural Science Data And Its Implemention

Posted on:2016-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:C G HuangFull Text:PDF
GTID:2308330503950592Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Agricultural scientific data storage is an important part of agricultural scientific research. The traditional storage system has a lot of deficiencies in the performance, storage capacity, data reliability, storage cost etc. In order to solve the PB level and unstructured and various forms of agricultural scientific data storage problem, this paper depth analysis the agricultural scientific data files, and launches the research on the distributed storage technology, put forward the solution of distributed storage system based on open source cloud computing platform Hadoop. The main achievements are as follows:According to their own characteristics and application requirement of agricultural data, this paper designs the framework model of distributed storage system for agricultural data. The unstructured the data are stored in the improved HDFS architecture, heterogeneous, structured attribute data are stored in the HBase database system, to ensure data files and data with attribute relevance. This paper set the cache at the Client end and end node data, to improve the efficiency of file access.Aiming at agricultural scientific data which existing the problem of storing massive small files, the paper put forward merger storage strategy of multiple attributes of massive small files based on agricultural. The agricultural small files are classified according to specific attributes, and belonging to the same category of data are merged into a large file aggregation, and in this way it can effectively reduce consumption of a large number of small files of the central node memory, to improve efficiency of writing documents. This paper set up an index buffer in data points for small files, to improve the reading performance of agricultural scientific data.In order to solve the problem of hot data of agricultural scientific data files due to seasonal, in this paper it proposes a dynamic replica management strategy, including two aspects. On the one hand, it designs the dynamic replica adding and deleting the strategy based on the file access frequency. Statistical file access frequency in a fixed period of time, calculate the file of the heat, and considering the statistical period, file size and other factors, the number of copies to dynamically adjust the file; through statistical file access frequency within the fixed time, it calculates the heat of the file; and considering the statistical period, file size and other factors, to dynamically adjust the number of the file copies. On the other hand, it put forward dynamic replica placement strategy based on the state of the node, through a number of indicators considering the data node state, to achieve a dynamic replica placement algorithm, to improve the cluster balance and storage efficiency.Based on the above research results, this paper describes the design and implementation of AGRFS "distributed storage system for agricultural data". AGRFS achieved the basic function module and user interface. And the paper set up a Hadoop cluster to verify the feasibility of the strategy and the availability of the system through the experiments. The results show that the proposed strategy of small file storage and dynamic replica management strategy improved the efficiency of file read and write operations, and optimized the performance of the system. And the distribution of storage system designed in this paper can well solve the problem of storage of agricultural scientific data.
Keywords/Search Tags:Agricultural science data, Distributed Storage, HDFS, Small File, Dynamic Replica Management
PDF Full Text Request
Related items