Research And Implementation Of Large Tables Join Algorithms On Claims System

Posted on:2016-01-14

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhang

Full Text:PDF

GTID:2308330461976088

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the advent of Age of Big Data, In memory computing based share nothing database system is more and more popular, becoming the cutlery of relational data query processing. Claims is one such system, its goals are taking advantage of rapid in-memory computing and overcoming network transmission bottlenecks. Meanwhile, large tables join is a fashionable research problem, it means that two relations contain too many tu-ples, even reach the capacity limitation of a single node. In Claims, the relations are dis-tributed on multiple nodes,since in memory processing speed is much faster than network transmission speed, data shuffling occupies a large proportion of query time, network bandwidth becomes the bottleneck of query execution performance. Hash join and sort merge join perform highly on processing equal join, they consume a lot of CPU resources and network bandwidth resources. So how to make rational use of resources in the two join algorithms become core problem of large tables join in Claims system.This thesis focuses on Claims system and large tables join problem on Claims, the major contributions are listed as follows:1. Claims system runs on the cluster of high performance PC machines. this thesis proposes several techniques which are different from commonly used system:new banding load strategy, "all" pipelined execution, exchange and virtual master. In addition, for reducing the number of tuples which physical level handle, this thesis also analyses the implementation of traditional logical optimization rules on Claims system.2. This thesis describes the hash join algorithm implementation on Claims system: data is firstly repartitioned on join attributes, local join algorithm is then execut-ed. Besides, we propose optimize data layout, the algorithm can omit the network transmission by omitting the data shuffle.3. This thesis describe the basic implementation of sort merge join in Claims, and we propose a detail optimization sort merge join optimization algorithm. the al-gorithm takes advantage of rapid computing of in-memory, decreases data volume to network transfer and avoids resource conflicts and idle. The algorithm firstly accumulates accurate statistics on join attributes of two tables, then work out the partition strategy under the load balance mode, finally select a nodes permutation by using KM algorithm to reduce the data volume to transfer, finally we stagger the use of CPU resources and network bandwidth resources.To sum up, by describing the key techniques of share nothing database system Claim-s, we confirm the challenge of large tables join problem in the Claims system. We propose the optimized algorithm for hash join and sort merge join, Extensive experiment confirm-s the feasibility and high efficiency of them. The algorithms also improves the query response time by huge margin, and gets a higher performance than other similar systems.

Keywords/Search Tags:

Share Nothing DataBase System, In-Memory Computing, Large Tables Join Problem, Parallel Join Algorithm

PDF Full Text Request

Related items

1	Optimization Of Database Join Algorithms On DRAM/NVM-Based Hybrid Memory
2	Complete overflow management algorithm to join large relational tables
3	Research And Realization Of Database Hash Join Algorithm On Multi-core CPUs
4	Design And Optimizations Of Multi-table Join Operations For NVM-based In-memory Databases
5	Implementation And Evaluation Of Big Data Parallel Join Algorithms
6	Design And Implementation Of Hash Join In Separated Computing And Storage Database
7	Research Of High-Dimensional Space Join And Query Algorithms Based On Main-Memory
8	Research On Hash Join Algorithm In DM Database
9	The Research And Application Of Parallel Similarity Join For Large-scale Strings
10	Research On SSD-based Join Query Optimization Technology In Array Database