Font Size: a A A

Design And Implementation Of ETL System In Data Center

Posted on:2012-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:2178330332985819Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the launching of information technology, the government has increased investment in information system construction to help the government deal with a variety of e-government and management. The operation of the system accumulates a large amount of business data; however the business data in various systems are distributed in their respective systems (operation system, database management system and data standards are not all the same). Lack of correspondence and conversion between these data and coordination ability is poor. "Information Island" is formed. Although the information could circulate in the system, the interaction with other systems is poor. Their systems work in isolation cause a lot of redundant data and duplication of business people, and traditional point to point data exchange allows the system integration costs and maintenance costs multiplied increased. The establishment of the public data center is the top priority.ETL (Data Extraction, Data Transform and Data Load) is the key of building the public data center. ETL system integrates all of the organization's resources into a seamless, side by side, easy access to data assets; in this way, the powerful data assets run like a single system. ETL through the establishment of the underlying data exchange platform to contact the entire sector institutions heterogeneous systems, applications, data sources, etc.. The platform is designed to meet the needs of the in-house business systems, databases, data warehouses, and other important internal systems sharing and exchanging data.This thesis bases on the Shanghai Pudong New Area government public data center information system which I took part in. The first to analysis and design the public data center information system, and then accordance with the actual needs of public data center to design and implement the ETL progress in public data center. First of all, in view of the confident characteristics of public data, data access in different ways is designed. Public data center supports online data collection, and also supports data from the CD-ROM and other media uploaded to the data center. Second, during the process of setting the rules of data transformation integrate the corresponding domain knowledge, to ensure data quality. Third, we have implemented load balancing among front-ends to the data extraction efficiency and improve the system availability and scalability. Finally, the different steps of data conversion are assigned to two different ETL servers to ensure the efficiency.The main contents are as follows:(1) Data warehouse basic theory;(2) The overall demand analysis for public data center information system;(3) The overall design for public data center information system;(4) The design and implementation of public data center information system ETL: ETL process design, ETL environment preparation, ETL implementation, ETL testing, Exception Handling, ETL daily management.
Keywords/Search Tags:ETL, Data warehouse, Data Extraction, Data Transform
PDF Full Text Request
Related items