Design And Implementation Of Hbase-based Traffic Stream Data Real-time Storage System

Posted on:2017-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:T Lu

Full Text:PDF

GTID:2308330485492449

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of large data processing technology, modern urban intelligent transportation system (ITS) benefits many new development opportunities. Currently, large cities have already established special vehicle data acquisition network, and the collected data has gradually formed great value of large-scale traffic flow data. Traffic data is typical a kind of streaming data, besides its varieties, it also has the characteristics of fast speed and large amount of data. When facing streaming traffic data, the data storage system based on traditional relational database has proved to be problematic, such as high write delay, poor ability of horizontal extension and so on. Compared with the traditional relational database, NoSQL database, HBase for example, has the advantage of fast storage speed, large storage capacity extension because of its simple data model. Therefore, it is extremely suitable for traffic streaming data storage.However, there are still some problems in the practical application of NoSQL database. We conclude them as following:(1) the data is written into the hot spot, which greatly reduces its write performance when the row key has a continuous increasing or decreasing characteristic. In addition, the parameter setting in the HBase process also has a great impact on the writing performance of the database; (2) HBase cluster supports dynamic extension, but existing work is based on the manual extension method, and its automatic extension support is insufficient; and (3) query interface support for standard SQL statements is not provided.To address these problems, a real-time traffic streaming data storage system based on HBASE, DeCloud-RealBase for short, is dedicated designed and implemented in this work. It involves the following three main parts:(1) In order to improve the traffic flow data of real time writing ability, the design of multi-buffer, multi thread, pre-partition and row key structure optimization strategy is designed. We also implement transfer of non-real time existing history data to HBase cluster in the database.(2) For cluster extension, traditional manual extension of Hadoop cluster and HBase cluster is abandoned, and the dynamic scaling of HBase cluster is realized through shell script, which supports the rapid extension of cluster. On the one hand, it accelerates the speed of the cluster extension, improves the efficiency, on the other hand, it avoids the configuration errors caused by complex cluster deployment.(3) The open source SQL parser GSQLParser is first used to analysis the standard SQL statements, then it is converted to HBase query language. Combining with HBase coprocessors, we implement the standard SQL queries in HBase.(4) Finally, a series of experiments are carried out based on this system. According to experimental analysis, in most cases, the system has good extension, storage and query performance, and can meet the needs of the actual work.

Keywords/Search Tags:

stream data, HBase, real-time storage, data migration, dynamic extension

PDF Full Text Request

Related items

1	HBase-based Storage And Query System For Traffic Checkpoints Data
2	Study On The Distribution And Migration Mechanism Of Data In Cloud Storage
3	Research On Real-time Transmission And Management Technology Of UAV Freight Big Data
4	Real-time Storage Technology For Data Stream
5	Data Stream Security Storage And Real-Time Computing Inspired By Bio-Intelligence
6	Researsh Of Data Migration And Storage Based On Hadoop
7	The Research Of A Data Storage And Transfer Of HDFS Based On FTP Service
8	Research On Efficient Distributed Storage And Query Algorithm For Real-time Data Stream
9	Quality Monitoring And Analysis Of Steel Products Based On Real-time Data Stream
10	The Design And Implementation Of Real-time Processing System For Device Log Stream Data Based On Storm