Font Size: a A A

Research Of Real-time Data Warehouse Storage Strategy Based On Multi-level Caches

Posted on:2013-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:J WuFull Text:PDF
GTID:2248330371484012Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, information technology and computer science has been rapidly developed. In order to be competitive in the marketplace, the data and information management are analyzed quickly and accurately in company’s operation and decision making which is vitally important. Data warehouse is a data set that can be used to support the management and decision making analysis in the company or organization. The data set could reflect the historical variation of the business operation. It is relatively stable, and is an integrated data set based on the main subject. It becomes necessary tool in decision making of the operations of the company or organization. Facing more or more competition, real-time data is very critical. Our goal is to get the data and make analysis, and decision quickly. Traditionally, it takes long time to load the warehouse data. The real-time data cannot be processed, and updated. The analysis is based on historical data. So the historical data are analyzed which can not reflect the changes of commercial information both from company and marketplace. It is hard to effectively leverage company’s data to meet organizational need. Real-time data warehouse extends the ability of traditional data warehouse. It not only reduces time delay of the data, but also supports company’s strategic decision making. Therefore, the study of the structure of real-time data warehouse is very meaningful and important.The paper analyzed the status of real-time data warehouse and then introduced the strategy of real-time data warehouse based on real-time storage area. At the same time, it discussed how to use the real-time data warehouse. It analyzed and focused on the real-time data loading and OLAP query adaptation and optimization based on characteristics of real-time storage area. The paper introduced a real-time data storage strategy based on double-image area partition. The ability of responding effect of OLAP query is clearly improved compared to that of query from data library. The method allows querying most fresh data from the real-time storage section. It reduces data query time and minimizes the impact on continuous data loading. Also, we proposed incorporating multi-level caches into the data warehouse structure which is based on real-time partition and discussed the design and implementation in details. Using multi-level caches method, every query is directed to the storage unit which has all the necessary data. That means query distribution is loaded into every storage. This solves the confection issue of query and update data within one cache area. It used multi-level caches to satisfy the different requirements which ensure the freshness in different gradations. It put every demand to its corresponding cache to avoid multi demand in the same cache area to reduce demanding time. The test system of real-time data warehouse was designed and build based on the TPC-H benchmark. Tests are performed for both real-time storage area and multi-caches real-time storage areas. The test verifies that the speed of the real-time storage area case is more reliable. The test results validated the rationality and effectiveness based on both the real-time storage and multi-cache solutions.
Keywords/Search Tags:real-time data warehouse, real-time storage area, multi-level caches, query contention, datafreshness
PDF Full Text Request
Related items