Font Size: a A A

Research And Implementation On Query And Update Scheduling In Real-Time Data Warehouses

Posted on:2011-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:X Y CaiFull Text:PDF
GTID:2248330395957423Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As information technology develops at full speed, the information construction of an enterprise or organization is deepening. Thus, it needs data warehouse to offer decision making due to a large amount of encoded information it collects. The demand for real time of data warehouse is increasingly high to enhance competitive advantage or to enhance the social safety capability. However, traditional data warehouse can’t meet the demand for real time, so real-time data warehouse emerge that support real time decision, In the application of real-time data warehouse, decision makers expect short response time and high data freshness. This makes it challenging to meet the two demands due to high workloads and continuous query and update flows that may be conflict with each other. But it is of great relevance to applications. It is an urgent problem to meet the requirements of users. Therefore, real-time data warehouse architecture and real-time scheduling algorithm become the focus of research, and the research on the two problems is of great significance.Firstly, this thesis analyses the existing real-time data warehouse architectures, then determines the load distribution mechanism of update and query tasks and uses copy to separate the transform process from OLTP system in order to reducing the influence on OLTP. Then, this paper proposes improved ODS-based real-time data warehouse architecture that classifies real time update tasks, uses trigger-based real time capture, and adopts different mapping and loading mechanism. This scheme can load the update data in real time as possible to improve query result freshness. This paper puts the real-time and unloaded tasks into update task queue, and puts query tasks submitted by users into query tasks queue. For the two kinds of tasks, this paper proposes requirement-based two-level scheduling algorithm about query and update. It allows users to express their requirements according to specifying the acceptable response time delay that is quality of service and the acceptable result staleness that is quality of data. Then this paper introduces the idea and implementation method of this two-level scheduling strategy in detail. At last, we evaluate the performance of scheduling algorithm based on architecture proposed in this paper using TPC-DS benchmark.As is shown in our experiments, in low, medium and high workloads, this scheduling algorithm based on improved real-time data warehouse architecture performs better than the three traditional scheduling algorithms, greatly improves the user satisfaction and adapts fast to changing user requirements and workloads.
Keywords/Search Tags:Real-time data warehouse, real-time data capture, scheduling, quality ofservice, quality of data
PDF Full Text Request
Related items