Font Size: a A A

Research On Optimization Of Main Memory Database Query Execution On Multi-core CPUs

Posted on:2017-10-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:T YuanFull Text:PDF
GTID:1368330542992982Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,various data have experienced explosive growth,and database system has become a research hotspot in the field of computer in recent years.Main researches on database systems include query execution,query optimization,and data storage.Query execution including various basic database operations is a core part of database system,and also has a great research value.With the development of semiconductor technology,the single-core CPU has met the bottleneck of performance improvement,and the multi-core CPU has been developed quickly and become mainstream product in CPU markets.In addition,memory capacity is rising rapidly while memory price is continuing to drop,so that many current database management systems are able to put their whole working set in main memory.The rise of main memory database makes researchers put emphasis on how to improve the efficiency of database operation and memory access,rather than the efficiency of disk access.Although great progresses have been reported in the field of optimizations on query execution of main memory database on multi-core CPUs,the technical challenges remain in optimizing some basic database operations with parallel resources on multi-core CPUs.Combined with current research achievements,we makes use of parallel resources on multi-core CPUs to optimizing hash partitioning algorithm,adaptive indexing algorithm,and hash join algorithm in the dissertation.The main work and innovations are summarized as follows:(1)This dissertation summarizes the common strategies to avoid collision between threads on multi-core CPUs,including lock strategy,independent output strategy,twice scan strategy,and parallel buffer strategy,and analyses their advantages and disadvantages.Based on that,some optimizations are proposed to address the problems in hash partitioning algorithm.Firstly,the software write-combining optimization tries to reduce TLB miss by keep a buffer for each partition.These buffers are filled first,only if a buffer is full,it is flushed to the final partition.Secondly,the bypass cache optimization uses non-temporal writing operation to directly write the data to main memory and bypass the cache entirely,improving the efficiency of writing operation.Thirdly,the improved hash table supports dynamic memory allocation,improves the efficiency of memory access,and reduces the initialization cost.Finally,the skew data optimization can make the proposed method suitable for processing various skew input data.Experimental results show that optimizations proposed in the dissertation can improve the overall performance of hash partitioning algorithm,and suit for skew input data.(2)This dissertation summarizes the current adaptive indexing algorithms,and analyses their advantages and disadvantages.Based on that,this dissertation firstly proposes a mechanism for self-selecting dynamically different optimizations according to cracking position,selectivity,and the number of queries for partition in database cracking.Secondly,this dissertation proposes a parallel adaptive merging algorithm on multi-core CPUs.In this algorithm,the index initialization is implemented by parallel sorting algorithm,and thread-level Parallelism and radix sorting are applied for query execution and index optimization.Thirdly,an improved parallel adaptive indexing algorithm is proposed,which combines Parallel Database Cracking method with Refined Partition Merge algorithm.In this algorithm,when less data chunks are in the index,we use the Refined Partition Merge algorithm so as to reduce the probability of conflict between threads,decrease the waiting time,and increase the utilization of the threads,and when more data chunks are in the index,we use Parallel Database Cracking method so as to take full advantage of the CMP's parallel execution resources.Finally,the mechanism for self-selecting dynamically different optimizations,the parallel adaptive merging algorithm,and the improved parallel adaptive indexing algorithm are proved viably and effectively through the experimental results.(3)In this dissertation,Thread-level parallelism and data-level parallelism are exploited to improve the performance of hash join algorithm.We firstly implement hash join algorithm with shared-memory MapReduce model on multi-core CPUs,including No partition hash join and partition hash join.Then we design an improved cuckoo hash table for our hash join,which consists of a cuckoo hash table and a chained hash table.Based on our implementation,we also propose three optimizations,one is the usage of SIMD instructions,one for load imbalance of input data and another for partition phase.Finally,hash join and optimizations proposed in this dissertation are proved viably and effectively through the experimental results.The results show that:(1)our two hash join implementations achieve better results than the preview works,(2)the partition hash join often outperforms the No partition hash join,and the memory access becomes performance bottleneck when the number of threads increases,(3)the number of buckets,the number of partitions,the number of threads and dataset can also affect join performance.
Keywords/Search Tags:Database system, hash partitioning, adaptive indexing, hash join, multi-core CPUs
PDF Full Text Request
Related items