Font Size: a A A

Research And Implementation Of Parallel Query Processing In Column-store

Posted on:2015-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:G H ZhangFull Text:PDF
GTID:2268330425981987Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the popularity of the network and the arrival of the information age, people’s daily life has been facing enormous data. How to establish a data warehouse system around these data, and then to do data mining and date analysis is becoming a hotspot of data processing. These data has a higher requirement for the speed of query. So the traditional row-store system cannot adapt to the requirements of modern mass data any more. However, the column-store system can provide the underlying storage model for massive data processing.In recent years, the microprocessor has developed rapidly. Due to the limitation of processor power consumption and design, the development trend of the CPU is gradually moving from high-frequency single-core to multi-core processors. Single-core processor almost disappeared in modern processor market, while CMP (on-chip multi-core processors) has become the mainstream in the market. Multi-core processors can provide hardware environment available for the parallel query processing.The main contents of this paper are the design and implementation of parallel query column-store system. In column-stores D WMS developed by our laboratory, we analyzed the existing query technology, then designed and achieved a set of parallel query module. First, we choose the various stages of a query processing to make analysis for parallel query optimization at each stage. For example, in the hash-join stage, multiple joins can hash operations simultaneously. After we analyzed query execution mode based on pass block, established the pass block buffer for Pipelined Parallel Processing. The way of transmitted data changes from pass block to pass block buffer, so that each node only requests data from the buffer. This achieves a separation between father and son nodes. Through the effective management of the buffer, we can improve DWMS query performance. Finally, we make an analysis of the parallel design of the entire query. In order to Improve query efficiency further, we will set the relevant parameters, the number of buffers and parallel modules.In our parallel multi-core environment, we have a multithreading design for the DWMS data warehouse system, and this design mainly includes parallelization and Pipelined of nodes. Through the theoretical analysis and relative experiments, the design of query parallelization can improve the query effectiveness of DWMS.
Keywords/Search Tags:Column-store, Multi-core processors, Pass block buffer, Parallelization, Multithreading
PDF Full Text Request
Related items