Font Size: a A A

Research And Realization Of Database Hash Join Algorithm On Multi-core CPUs

Posted on:2015-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2308330464466748Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, with the popularity of multi-core CPU hardware architecture and the development of SMT(Simultaneous Multi-Threading) technology, multi-core CPU is becoming a hot topic in the field of high-performance computing for its powerful parallel computing capability. People have begun to study the use of multi-core CPU for parallel acceleration.on multiple data operation. The join operation which is the most commonly used and most time-consuming operation in the database becomes the focus of research.In a relational database, users often need to query the database. The join operation is the most commonly used and most time-consuming operation is query operations, which includes θ-join, equi-join and natural join. The join operation is one of the fundamental relational database query operations. It facilitates the retrieval of information from two different relations based on a Cartesian product of the two relations. The Join is one of the most difficult operations to implement efficiently, as no predefine links between relations are required to exist(as they are with network and hierarchical systems). The join is the only relational algebra operation that allows the combining of related tuples from relations on different attribute schemes. Since it is executed frequently and is expensive, much research effort has been applied to the optimization of join processing, Basic join operation algorithms include nested-loop join, sort-merge join, and hash join, of which hash join is widely used in the current DBMSs.With the development of multi-core parallel technology and in-memory technology, many variants of join algorithms have been proposed. They make full use of parallel technology and modern hardware architecture to get better performance. These studies have shown that the hardware architecture has a great effect on the performance of join algorithms. In addition, memory access is another performance factor for join algorithms.In this paper, the basic join algorithms are systematically studied,focusing on hash join algorithm. By using multi-core CPU hardware architecture and following memory locality principle,a corresponding parallel optimization join algorithm is proposed. the main research results of the following two points:1. Basing on the multi thread model on multi-core CPU hardware architecture, a parallel hash join algorithm implemented by Map Reduce model is proposed. The algorithm makes full use of the implementation of thread scheduling, automatic task assignment and management, load balancing control, error correction function provided by Mapreduce model.With the help of a job partitioning strategy proposed by following the memory locality principle, the algorithm increases the hit rate of memory and improves memory efficiency, so as to optimize the performance of hash join operation finally.2. To solve the problem of data skew, memory latency and performance bottleneck caused by memory pressure, three corresponding optimization strategies are proposed. The test results show that the three strategies have achieved the desired effect, the problem was solved better.The test results show that the parallel hash join algorithm on multi-core CPU hardware architecture proposed in this paper has significantly better performance than traditional join algorithms. While using the characteristics of Map Reduce model, with the three optimization strategy proposed, the algoritm compared to some of the existing similar parallel hash join algorithms also has a good performance,which can be used in the database join operation on multi-core CPU hardware architecture.
Keywords/Search Tags:Multi-core CPU, parallel processing technology, database join operation, parallel hash join algorithm
PDF Full Text Request
Related items