Font Size: a A A

Data Warehousing Solution--the Architecture Design And Cache Management Research

Posted on:2007-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:J H LuFull Text:PDF
GTID:2178360182966707Subject:Computer applications
Abstract/Summary:PDF Full Text Request
In recent years, research and applications in Business Intelligence and Data Warehouse have attracted more and more attention from research fellows, programmers and users. It has become one of the most rapidly-developed new technologies in the field of computer application.With the development of the domestic informatization construction, the needs for Data Warehouse solutions have been in a dramatic rise in China.Now, some leading providers of Data Warehouse solutions,have emerged. However, these business solutions' price is very high.Therefore,these solutions are generally not a good choice for the small or media corporations and government departments.Furthermore,the research on business product is not easy while this prodcuts' code is not open.Meanwhile, in the field of data warehouse, many open source projects have enjoyed a rapid development Providers like ETL, OLAP and Data Mining have done a successful job.So,this paper research the construction of Data Warehouse by the open source solution.The Data Warehouse system differ from the normal database system in it's analysis abality and huge capacity,so,how to construct a Data Warehouse system with high performance is now become the research hot point.There are many factors which are concerned with the Data Warehouse system's performance,such as schema design, concurrent processing and cache management.This paper select cache management strategy for the research.This paper first introduces the concept of data warehouse and its relative technologies, and then discusses the status of some excellent business and open source products in the field of data warehouse. Then this paper presents an open source data warehouse solution based on the multi-layer J2EE construction. The data layer uses Mysql, and ETL system is built based on CloverETL, OLAP engine on Mondrian, OLAP representation layer on Jpivot, metadata management system on Eclipse's Plugin Mondrian Schema Editor Plugin. Considering that this solution doesn't use EJB technology, Tomcat is selected as the J2EEserver.Further, the paper gives a code analysis for the core part, which includes Mondrian, Jpivot and Clover ETL.Then this paper analyze the common stragegies of cache management,with an emphasis on Data Warehouse system's cache management strategies.The next,This paper present and realize a cache management strategies base on LRU exchange algorithm. Finally,we give the improved solution xombine PrePaging scheduling algorithm with improved LRU exchange algorithm.In the last part, the implementation process of the data warehouse system for the Hang Zhou Human Resources Market is introduced.
Keywords/Search Tags:data warehouse, open source, cache management
PDF Full Text Request
Related items