Font Size: a A A

Research And Implementation Of Disaster Big Data Management Methods Based On Cloud Computing

Posted on:2016-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q XuFull Text:PDF
GTID:2308330479993946Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the technology of Internet of things, more and more data which are produced by machines has been far more than that human produce, so how to store and manage huge amounts of data of Internet of things effectively has become one of the problems people are asking. The use of video devices and sensors to do safety monitoring had become one of the most important applications of Internet of things, the information collected by videos and sensors can provide scientific data support to the inversion and analysis of accident when an accident happens.In order that the massive data of Internet of things could exert their potential value at the extreme, we must organize and store them in an effective method. In the paper, we analyze the advantages of the new data storage methods compared with the traditional storage methods, and propose a disaster big data storage method based on cloud computing which provides quick ability to write and read. In this paper, we propose a series of storage methods to store and manage video small image files and sensor data which is based on the most representative cloud computing framework Hadoop and make a theoretical analysis. At the end of this paper, we conduct a series of experiments to compare the methods in this paper with other methods to verify the performance advantage in managing massive disaster big data.In this paper, what we research and implement are the following:Firstly, we analyze the storage characteristics, write and read operation in HDFS and HBase separately, and then implement the storage methods that use HDFS to store video image files and HBase to store image file metadata and sensor data persistently, which increase the efficiency of storage and management.Secondly, HDFS is designed to process large files while video image files are smaller, so using HDFS as the file system layer to store massive video image files can lead to the heavy load of the Name Node which will decrease the performance of the Hadoop cluster. So we propose a new small file merging and storage strategy in this paper. The core idea of this strategy is to merge small files in time, that is to say, each small file uploaded to HDFS will be merged into a big file cache immediately until the cache is full, then output the cache contents to HDFS. Before the write operation, a small file pre-processing module worked in the HDFS client will select a big file cache for merging and the small file metadata will be stored in the HBase at the same time; when to read, the metadata in HBase must be obtained firstly, and then read the real file based on the metadata.Thirdly, using HBase as the data persistence layer to store small file metadata and sensor data. In order to meet the demand of multi-condition query of time series data, a method that rowkey-inverted index table based on the secondary indexes is proposed, it not only provides multi-condition query, but also balances the data storage performance with the reading performance. In addition, we construct Mapreduce application on HBase which makes HBase has the ability of statistical analysis of massive data.Finally, a series of contrast experiments are finished with iconic and scientific data which indicates that the methods HDFS small file merging algorithm and the secondary index method based on HBase proposed in the paper are better than other methods on the read and write performance improvement, these are enough to verify the correctness of the two data storage and management methods.
Keywords/Search Tags:Big Data Storage, HDFS, HBase, Small File Storage, Merge Files, Secondary indexes, Cloud Computing, Hadoop
PDF Full Text Request
Related items