Research On JOIN Operation Of Massive Temporal Data

Posted on:2017-05-07

Degree:Master

Type:Thesis

Country:China

Candidate:Z L Wang

Full Text:PDF

GTID:2348330509457102

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In temporal database, the join operation of temporal data plays an important role, on the one hand the cost of the operation itself is expensive.Assume that we have two tables each have n tuples for the JOIN operation, if we use the simplest method of nested loop, the time complexity is O(n2), the cost is very high, especially for the amount of data rapidly growing today, the n is very large,so n2 is much larger that we can't imagine.On the other hand efficiently realizing the JOIN operation will help to solve other problems,that is very important to improve the efficiency of the query optimizer.Nowadays there are four kinds of algorithms to solve this problem:nested-loop join algorithms,sort-merge join algorithms, partition-based join alogorithms and index-based join algorithms.Nested-loop join algorithms are simple and easy to implement, but the time complexity of the algorithms is too high,and it often can't work in practice.Join attribute must be ordered for sort-merge join algorithms, however the join property of temporal data is two-dimensional, so it is difficult to get a linear sequence, this leads to that the relations have to be read for many times and degradation of the join performance.Although there are some other algorithms which add restrictions in the join attributes for a good performance, these algorithms lost the universality.There are some good join algorithms which is partition-based or index-based, such as the overlap interval partition join algorithm proposed in [1], but it is not an accurate algorithm, the algorithm will produce more join results than the actual join results.In order to overcome the defects of these algorithms, this paper puts forward the incremental overlapping interval join algorithm based on symmetric index which is called SISJoin.Firstly the algorithm is a precise algorithm, secondly the algorithm is simple and easy to implement, lastly theory analysis and experiments verify the efficiency of the algorithm.The algorithm constructs the index structure using the join condition directly,so the index structure is very simple and building operation of the struct is very efficient.Using the necessary condition of the join we can filter the data in advance which unlikely to create the join results, so the join data size is reduced, and then using the strategy of incremental we can greatly reduce the number of comparison.Finally, we did four experiments, the experimental results showed that the join time will reduce dozens of times using incremental strategy compared with that not using incremental strategy,the advantage of the algorithm is significantly on real data sets,long lived tuple hardly affect the performace of SISJoin algorithm,and SISJoin algorithm also showed good scabality under the distributed environment of large amount of data.

Keywords/Search Tags:

temporal database, overlap interval join, symmetry index, incremental

PDF Full Text Request

Related items

1	Research Of Interval Join Method Base On MapReduce
2	Research On Set Similarity And Set Containtment In Main Memory Database
3	Research And Implementation Of A Segmentation Hybrid Temporal Index Structure In Database
4	Testing Of Teradata Database And Join Index Feature
5	Research And Implementation Of High Efficiency Set Similarity Join Algorithm Based On Overlap Similarity
6	Research On Processing And Optimization Of Join Operation For Spatio-temporal Datasets
7	The Research Of Temporal Index Technique And Algorithm
8	Research On Index And Storage Of Spatio-temporal XML Database
9	Research And Application Of A Query Index For SDM In GIS
10	The Query Techniques Of Spatial Database On R-tree