Font Size: a A A

Design And Implementation Of The ETL In China Post Express Data Warehouse System

Posted on:2015-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:J X ZhangFull Text:PDF
GTID:2298330434450171Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the expansion of China Post Express business scale, the amount of data has a sharp increase. A storage system based on database, which built in1991-2005, is no longer adapted to the development of the business. China Post Express has faced with data consistency, timeliness of information, data integrity issues and so on. Because data warehouse has these characteristics like subject-oriented, integration, stability and historical preservation, it can bring solutions to solve these problems. China Post Express takes Teradata data warehouse technology to build a data warehouse. ETL (Extract-Transform-Load acronym, namely, data extraction, transformation, loading process), which can integrate and enhance the value of data in accordance with uniform rules, is responsible for the transformation process of the data from the data source to the target data warehouse. So ETL process is an important step to implement a data warehouse.Based on China Post Express data warehouse project, The paper first introduces data warehouse concept, the technical characteristics of real-time data warehouse, Teradata data warehouse and implementation theory. Secondly, this paper analyzes three main data sources of the data warehouse. According to business, the data warehouse is divided six subject areas and the paper introduces the design of the core model layer of the data warehouse. Thirdly, according to business requirements and the design of the core model layer of the data warehouse, the overall ETL architecture is designed. through abstracting Teradata Automation, the paper generalizes the design of ETL automation process. All kinds of jobs in ETL framework are designed including name conventions. Fouthly, according to the ETL design, the paper implements data extraction, data loading, load monitoring, data cleaning and validation, data conversion functions, then jobs are tested and optimized. Finally, the system operation and papers are summarized and model improvements, metadata management, system performance improvement are proposed.The author hopes this article will bring some inspiration for a new industry data warehouse ETL design and implementation. At present, China Post Express data warehouse has been put into use, the size of the data warehouse has reached28T, the average daily data increments is about30G, Queries generally are in three seconds, CPU and disk IO tilt rate are in the normal range, and the overall performance has meet business needs. As a result, China Post Express data warehouse effectively enhances the EMS brand.
Keywords/Search Tags:China Post Express, Data Warehouse, ETL
PDF Full Text Request
Related items