Font Size: a A A

Study On Integration Technology Of Application For Data Warehouse In Multi-Database Systems

Posted on:2016-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhanFull Text:PDF
GTID:2298330467993318Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In today’s rapid information growth, characteristic of massive, distributed and heterogeneous data is demonstrated. It makes the centralized data warehouse to get more restricted in terms of data analysis and processing. Based on distributed data warehouses with the characteristics of low maintenance costs, high data integrity, fault-tolerant, high capacity and large storage space, it advantages for some situations. Typical examples are the banking and e-commerce platform.This project is based on a SaaS platform, whose characteristics are facing the small/micro businesses and having the same business model. As businesses independent of each other, they have respective marketing strategies. Platform uses multiple database systems, and creates separate tablespace for each enterprise user. The corresponding tables in different tablespaces have the same structure. To meet the analysis needs of different enterprises and platform, establish a two-level data warehouse. By contrast, the features of high fault-tolerance and large storage for distributed data warehouses is more suitable for the platform application.Integrated technologies for data warehouses of key research questions include:designing logic model, data ETL, data transmission strategy and metadata administration, thus we solve the above problems mainly. Combined with the project background, the whole system can be divided into two parts:First part is data analysis for users and administrators; Second part is data warehouse management for administrators. Depending on the different analytical requirements, the analysis themes and granularity between the two-level data warehouses are different. So the logic model for the two-level data warehouses should be designed respectively, then use the open source tool of multidimensional analysis and presentation to display the readable result. For the second part, we emphasize on implementing to manually create the data warehouse. About the data processing, according to the update frequency of the table, data can be separated into two kinds of real-time and delayed. On three aspects of data extraction, data transformation, data transmission to analyze and implement. For research on data transmission strategy, this paper summarizes two methods which include round-robin scheduling and data-driven. By comparing these two strategies, it finds that the data-driven is more suitable for the platform application. Metadata management should also be the focus of research. Besides above all, the paper innovatively discusses the logic model integrity, and we implement to restore the dimension tables (or fact tables) which are dropped improperly.The paper also studies the efficiency of data integration. By simulating the actual platform application experiment, it obtains that using the table partitioning technology which is provided by ORACLE can improve the efficiency of data integration. The conclusion lay the foundation for platform further optimization.
Keywords/Search Tags:Two-level Data Warehouses, Data Extract, Incremental Update, Data-driven, E-Business
PDF Full Text Request
Related items