Font Size: a A A

Research On JOIN Operation Of Massive Temporal Data

Posted on:2017-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z L WangFull Text:PDF
GTID:2348330509457102Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In temporal database, the join operation of temporal data plays an important role, on the one hand the cost of the operation itself is expensive.Assume that we have two tables each have n tuples for the JOIN operation, if we use the simplest method of nested loop, the time complexity is O(n2), the cost is very high, especially for the amount of data rapidly growing today, the n is very large,so n2 is much larger that we can't imagine.On the other hand efficiently realizing the JOIN operation will help to solve other problems,that is very important to improve the efficiency of the query optimizer.Nowadays there are four kinds of algorithms to solve this problem:nested-loop join algorithms,sort-merge join algorithms, partition-based join alogorithms and index-based join algorithms.Nested-loop join algorithms are simple and easy to implement, but the time complexity of the algorithms is too high,and it often can't work in practice.Join attribute must be ordered for sort-merge join algorithms, however the join property of temporal data is two-dimensional, so it is difficult to get a linear sequence, this leads to that the relations have to be read for many times and degradation of the join performance.Although there are some other algorithms which add restrictions in the join attributes for a good performance, these algorithms lost the universality.There are some good join algorithms which is partition-based or index-based, such as the overlap interval partition join algorithm proposed in [1], but it is not an accurate algorithm, the algorithm will produce more join results than the actual join results.In order to overcome the defects of these algorithms, this paper puts forward the incremental overlapping interval join algorithm based on symmetric index which is called SISJoin.Firstly the algorithm is a precise algorithm, secondly the algorithm is simple and easy to implement, lastly theory analysis and experiments verify the efficiency of the algorithm.The algorithm constructs the index structure using the join condition directly,so the index structure is very simple and building operation of the struct is very efficient.Using the necessary condition of the join we can filter the data in advance which unlikely to create the join results, so the join data size is reduced, and then using the strategy of incremental we can greatly reduce the number of comparison.Finally, we did four experiments, the experimental results showed that the join time will reduce dozens of times using incremental strategy compared with that not using incremental strategy,the advantage of the algorithm is significantly on real data sets,long lived tuple hardly affect the performace of SISJoin algorithm,and SISJoin algorithm also showed good scabality under the distributed environment of large amount of data.
Keywords/Search Tags:temporal database, overlap interval join, symmetry index, incremental
PDF Full Text Request
Related items