| With the rapid development of large data processing technology, modern urban intelligent transportation system (ITS) benefits many new development opportunities. Currently, large cities have already established special vehicle data acquisition network, and the collected data has gradually formed great value of large-scale traffic flow data. Traffic data is typical a kind of streaming data, besides its varieties, it also has the characteristics of fast speed and large amount of data. When facing streaming traffic data, the data storage system based on traditional relational database has proved to be problematic, such as high write delay, poor ability of horizontal extension and so on. Compared with the traditional relational database, NoSQL database, HBase for example, has the advantage of fast storage speed, large storage capacity extension because of its simple data model. Therefore, it is extremely suitable for traffic streaming data storage.However, there are still some problems in the practical application of NoSQL database. We conclude them as following:(1) the data is written into the hot spot, which greatly reduces its write performance when the row key has a continuous increasing or decreasing characteristic. In addition, the parameter setting in the HBase process also has a great impact on the writing performance of the database; (2) HBase cluster supports dynamic extension, but existing work is based on the manual extension method, and its automatic extension support is insufficient; and (3) query interface support for standard SQL statements is not provided.To address these problems, a real-time traffic streaming data storage system based on HBASE, DeCloud-RealBase for short, is dedicated designed and implemented in this work. It involves the following three main parts:(1) In order to improve the traffic flow data of real time writing ability, the design of multi-buffer, multi thread, pre-partition and row key structure optimization strategy is designed. We also implement transfer of non-real time existing history data to HBase cluster in the database.(2) For cluster extension, traditional manual extension of Hadoop cluster and HBase cluster is abandoned, and the dynamic scaling of HBase cluster is realized through shell script, which supports the rapid extension of cluster. On the one hand, it accelerates the speed of the cluster extension, improves the efficiency, on the other hand, it avoids the configuration errors caused by complex cluster deployment.(3) The open source SQL parser GSQLParser is first used to analysis the standard SQL statements, then it is converted to HBase query language. Combining with HBase coprocessors, we implement the standard SQL queries in HBase.(4) Finally, a series of experiments are carried out based on this system. According to experimental analysis, in most cases, the system has good extension, storage and query performance, and can meet the needs of the actual work. |