Font Size: a A A

Tqindex:An Effective Index Structure For Processing Temporal Queries On Very Large Scale Of Data

Posted on:2016-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y HeFull Text:PDF
GTID:2308330479482158Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the popularization of Cloud Computing, the Internet of Things, and Mobile Internet, there is an explosive growth of various types of data. Among these various data, the attribute of time is undoubtedly one of the most important attributes of data, it is closely connected to every entity in real world. How to retrieve the data records with specific time conditions effectively in these huge amount of data, has become a realistic and valuable research problem. It seems easy to implement such a retrieval, however, once the amount of data becomes huge, the effectiveness of traditional query method will become extremely low, which would lead to that, users cannot get the result within an acceptable time. Therefore, how to take good advantage of time attribute to set up a high-efficiency query strategy, is not only a very meaningful application problem, but also a challenge research issued to be solved.For such an application, common solutions include linear traversal of the whole data set to retrieve the data records that meet the established time criteria, and building index for time attribute in the traditional database. These two methods works, nevertheless, once the amount of the data become very huge, both of them will cause a great amount of time to get the correct result. In recent years, a number of distributed computing frameworks for big data come up, for instance, MapReduce and Spark. The usage of these frameworks is definitely a better solution for big data time query problems, in comparison to the aforementioned solutions. But it still isn’t enough. In addition, the lack of a mechanism for caching intermediate result make the distributed computing frameworks do a great amount of repetitive calculations, when the queries comes frequently and repeatedly.Based on the challenges mentioned above, this work define two main kind of problem model of time query problem based on the project, one is TimePointQuery problem, the other one is TimeRangeQuery problem. The TimeRangeQuery problem can be divided into two categories: TimeRangeInclusionQuery problem and TimeRangeIntersection problem.In order to solve the problem above, this paper design and implement an efficient time index structure called TQIndex. TQindex consists of two core modules, namely hierarchical indexing module and timeline indexing module. Hierarchical indexing module firstly execute the oneto-many hash forwarding strategy based on the time attribute of data, splitting data to a number of parts. And then timeline index structure establish two important data structure to each part of the data, namely event list and time list. These two list are the core component of the timeline index module. Event list is responsible for transferring raw data to events, while time list is responsible for recording the sequence of each event. To enhance the performance of TQIndex, this paper integrate the checkpoint mechanism into the time list, which is used for caching the intermedia results. The checkpoint mechanism is able to accelerate the TQIndex at the cost of extra acceptable storage cost. It should be noted that, the two core modules of TQIndex are not independent, but are indispensable to each other and connect to each other closely.Finally, based on TQIndex, this work design and implement three effective algorithm for solving the aforementioned time query problems. The detail of these algorithms are shown in this paper. This paper evaluate these algorithms both form theoretical as well as practical points of view, and the experimental results show that both TQIndex and these algorithms works effectively and efficiently on large scale of data set.
Keywords/Search Tags:Large Scale of Data, Big data, Temporal Query, Index, Query Algorithm
PDF Full Text Request
Related items