Font Size: a A A

Hash Join Algorithm Design And Implementation Of A Multi-core Environment

Posted on:2015-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhuFull Text:PDF
GTID:2268330431956583Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Hash join algorithm is typical in the area of database connection algorithm andhas the important status and wide application. Due to the expansion of scale ofdatabase is more and more big, the traditional Hash join serial algorithm have beenunable to meet many of the problems in the reality. So to find a way to improve theHash join algorithm performance becomes extremely important. There are many onthe Hash join algorithm is improved. For example, using distributed or cluster parallelsystem structure to improve. Although is enough to make performance improvements,but for the serial research has been the bottleneck of the algorithm, and for distributedor cluster parallel system structure has a significant improvement in their limits. Thisis because such a model of organizational structure is dispersed, the cost is very high,Reuse them at the same time also increases their power consumption andcommunication overhead. The new and improved algorithm becomes more important.Now CMP has the very strong parallel processing ability, speed faster. So using CMPprocessor for operation has become the current hot research topic in the field ofparallel computing.Therefore, the main research content of this article is to design a Hash joinalgorithm on the CMP coarse-grained parallel and internal fine-grained parallel,mainly for the following several aspects to improve the algorithm.First of all, the Hash filter and Search filter used in combination. The benefits ofusing filter is to filter out part does not match the tuple, and in a multithreadedfine-grained parallel execution of these filters, accelerated the search with the samevalue of the rate of tuples, greatly improve the Hash join the execution efficiency ofthe algorithm. To improve efficiency is over50%, in line with expectations. Inmulti-table connection, for the allocation of a relational table has a lot of kinds, thispaper, by using bushy-tree structure on distribution of the multiple relations in theconnection of each leaf node in the tree represents a relational table. And thentransforma bushy tree to allocation-tree. So allocation thread to allocation-tree instead of to allocation for a bushy-tree. There is no dependency between relationship, canparallel processing. Open multiple threads to perform at the same time, can reducemulti-table connection between time. Experiments show that this thread betweencoarse-grained parallel compared with the traditional CPU serial calculateacceleration ratio can reach30%, in line with expectations.Secondly, in multi-table connection, for the allocation of a relational table has alot of kinds, this paper, by using bushy-tree structure on distribution of the multiplerelations in the connection of each leaf node in the tree represents a relational table.And then transforma bushy tree to allocation-tree. So allocation thread toallocation-tree instead of to allocation for a bushy tree. There is no dependencybetween relationships, can parallel processing. Open multiple threads to perform atthe same time, can reduce multi-table connection between time. Experiments showthat this thread between coarse-grained parallel compared with the traditional CPUserial calculate acceleration ratio can reach30%, in line with expectations.
Keywords/Search Tags:hash join, hash filter, search filter, allocation-tree, CMP
PDF Full Text Request
Related items