| Massive data streams from sensors in Internet of Things(Io T)and smart devices with Global Positioning System(GPS)are now flooding to database systems for further processing and analysis.The capability of real-time retrieval from both fresh and historical data turns out to be the key enabler to the real world applications in smart manufacturing and smart city utilizing these data streams.However,state-ofthe-art solutions,e.g.HBase,do not render satisfactory performance,due to the high overhead on index update.Time series databases,e.g.Druid do not render satisfactory performance as well.They do not render efficient range queries over non-temporal attributes due to the lack of secondary range indexes.In this paper,we present a simple and effective distributed solution to achieve millions of tuple insertions per second and ad-hoc temporal range query processing in milliseconds.In this paper,we propose a new data partitioning scheme that takes advantage of the workload characteristics and avoids expensive global data merging.Furthermore,to resolve the throughput bottleneck,we adopt a template-based index method to skip unnecessary index structure adjustments over the relatively stable distribution of incoming tuples.Our solution fully exploits the limited computation power and network bandwidth by running traditional B+ tree indices over sharedHDFS architecture.The insertion operations only involve reads over intermediate nodes in the tree,consequently facilitating highly concurrent updates and queries with only minor contentions on leaf pages.To parallelize data insertion and query processing,we propose an efficient dispatching mechanism and effective load balancing strategies to fully utilize computational resources in a workload-aware manner.To evaluate the performance,we evaluate our prototype system with a lot of experiments and demonstrate the performance.First,we evaluate the indexing performance and data chunk size.Next,we evaluate adaptivity of our system.Finally,we compare the overall performance with state-of-the-art open-source system.On both synthetic and real workloads,our system consistently outperforms state-of-theart open-source systems by at least an order of magnitude.The main reason is the bilayer index architecture.What's more,template-based B+tree significantly reduces indexing maintenance overhead.Query dispatch algorithm and load balancing can utilize the computational resources. |