Font Size: a A A

Application Research Of ETL Based On CWM In Data Center

Posted on:2012-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:N JiangFull Text:PDF
GTID:2178330332485804Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the data warehouse application popularizing, various kinds of tools what build the data warehouse are emerging. In order to support the seamless integration of each part in data warehouse environment, the tools should work with each other, so that data stream can smoothly flow in all links of job. At the same time we should ensure the completeness and correctness in this process as much as possible. For this reason, the metadata in data warehouse must have a uniform and good definition.ETL is important part in data warehouse and business intelligence, and the development of ETL is time consuming part in the development cycle of data warehouse program. The developers usually pay more attention to business metadata and technical metadata in the process of data warehouse development, but ignore management of ETL metadata, leading to extend the development cycle of ETL. Currently, there have two metadata management structures:One is centralized metadata management structure that the whole system has only one metadata repository from what all tools and data warehouse directly get the metadata information. This structure only fit the small and medium scale companies. In complex scenarios of large companies'data environment, centralized management is almost impossible. Another structure is distributed structure, which set up several distributed and relative autonomy of metadata repository, to deal with a single metadata field, and the global metadata is managed by metadata management system. Although, the distributed structure manage metadata in distributed manner, the sharing part of metadata should get metadata from different meta database what used different metadata definitions, so we still solve the heterogeneous metadata problems. And these distributed or autonomous metadata database inevitably use the metadata exchanging protocol to integrate, so they should extend the development cycle. Common warehouse metamodel(CWM) is a model based on UML and a object-oriented model, which is used to establish common warehouse's metamodel. This paper deeply studied CWM, and used this model to establish the ETL metadata model. Then we mapped the object-oriented model to the relational model, and establish the metadata database, and applied the design of metadata database to data management subsystem of data center in Pudong. Through the effective management of ETL metadata, the ETL engineer can receive the ETL data source, transformation and mapping rules more directly, so that the development of ETL and maintenance of metadata become more easily.At first, the paper describes the relevant concepts of metadata, ETL and ETL metadata, and deeply study in the integrated framework, design principles, the features and relationship between each package of the CWM, especially the relevant packages of ETL in CWM. Then, using the related packages of the CWM, I design the object-oriented model of ETL metadata, and transform it to relational model, i.e. ER diagram of ETL metadata, and relational ETL metadata database. Finally, the design is applied to the data center project in Pudong.
Keywords/Search Tags:Data Warehouse, Data Center, ETL, Metadata, CWM
PDF Full Text Request
Related items