Font Size: a A A

Parallel Query And Optimization In Column-stores On CPU-GPU Architecture

Posted on:2017-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:J X ChenFull Text:PDF
GTID:2308330503953770Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the accumulation of business data, especially the rapid growth of network data.How to use scientific means to quickly process data and analysis data from different angles sbecomes a hot topic.Multi-core CPU, GPU and CPU-GPU hardware architecture provides a new possibility for the rapid processing of data. In particular, it can greatly improve the speed of query processing by combining CPU-GPU architecture and column storage data warehouse technology.Column storage data warehouse technology is mainly concerned with data’s query and analysis.However,the query and analysis of data is generally performed group and aggregation operation of the result set on the basis of the connection between the multiple tables.Therefore,table join and aggregate operations are two important factors that affect the performance of OLAP.On the other hand, the development of the hardware architecture provides the possibility to deal with the performance bottleneck in the OLAP query. At present, there are three types of parallel query processing methods on the single machine: multi-core CPU parallel query processing, GPU parallel query processing, CPU and GPU parallel query co-processing.However,the existing method has several shortcomings:Compared with row storage’s data partition strategy,in which the tuple is the basic unit. When the multi-table performers join operation, there are many foreign keys, it is required to choose the main partition attribute and auxiliary partition attribute according to certain rules. The problem are solved if the data is organizated by the column storage, but the existing data partition strategy is difficult to be directly applied to the heterogeneous system. In addition, the fine grained parallel strategy of CPU and GPU is mainly realizated by the division of task. However it didn’t conside the utilization rate of CPU and GPU.Therefore, the GPU and CPU’s phenomenon of imbalance occurs frequently one side load is too high and the other side of the load is too low.which greatly affects the efficiency of the implementation of the collaborative acceleration. For this purpose, we study how to integrate the data partition strategy and task allocation model into the heterogeneous storage data warehouse,and design and implement a hybrid parallel query engine.Firstly, According to the data is storaged by column in column storage system, the characteristics of the adjacent data have a high similarity, we design the hardware sensitive ICMDpartition strategy.At the same time, the paper also designs a task allocation model, which includes static task allocation model and dynamic task allocation model. THe static task allocation model is called to allocate the initial data load of GPU and CPU at the begining time of query and dynamic task allocation model is called to adjust the dataload of GPU and CPU during query execution.Then, the paper deeply studies the structure design of the database query engine, and designs a hybrid parallel query engine based on CPU-GPU.Subsequently, this paper further analyzes the generation characteristics of the hybrid query plan, and proposes a query optimization strategy to avoid the duplication of data transmission between CPU and GPU, the performance has been further improved.Finally, this paper takes column storage data warehourse DWMS as the platform, realizes the above key technologies, and tests query performance with the benchmark data set. Through the experiment’s comparison, the effectiveness of data partition strategy, task allocation model and hybrid query are verified.The experimental results show that the query performance of data warehouse based on HPQE-hybrid query engine obtained 23% increase compared to DWMS and18% increase compared to data warehouse which included GPU query engine-Ocelot and the query performance of data warehouse based on optimization query plan strategy and HPQE-hybrid query engine obtained 87% increase compared to DWMS and 68% increase compared to data warehouse which included GPU query engine-Ocelot.
Keywords/Search Tags:multi-core CPU-GPU, ICMD data partition strategy, task allocation model, collaborative parallel processing, hybrid query engine
PDF Full Text Request
Related items