Font Size: a A A

Data Exchange Research And Realization Based On ETL

Posted on:2010-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhuFull Text:PDF
GTID:2178360272996890Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data exchange is a necessary step of enterprise informationization. With data exchange, the enterprise can share information among different nodes through the network, has better ability to grasp overall situation and make decision. In this paper, the data exchange serve for the MIS (Management Information System) project of infrastructure of China Guodian Corporation, achieve information flowing between the source database of power plant and the data warehouse of corporation headquarter. The so-called MIS, composed of human, computer and other peripheral devices, is a system for information collection, transmission, storage, processing and maintenance; its task is to maximize the use of computer and network communication technology to strengthen enterprise information management. The infrastructure MIS is the management information system for the infrastructure phase of the power plant, belongs to the project GD193 of China Guodian Corporation, used for optimizing data integration, reducing management costs, improve efficiency, enhance flexibility and adaptability of the enterprise.Data exchange is the root of MIS, plays a role in providing data from power plant to data warehouse/ data mart on group, for re-processing. The content includes records in the database, offline files and so on. Now, there are a variety of data exchange technology, such as EDI, XML and Web Services. In this paper, I choose ETL as the way of data exchange, in order to build infrastructure MIS as a data warehouse, and combine data re-processing with data exchange itself, make preparation for further upgrading on system.ETL (Extract-Transform-Load) is a part of data warehouse technology. Data Warehouse is a typical representative of analysis processing to database system. Unlike traditional transaction processing(operational processing), data warehouse focus not on daily operations such as insert, update and query, but on decision analysis with large-scale historical data. Data warehouse includes four functional areas: data acquisition, data organization, data application and data display, ETL is located in data acquisition, it is the necessary step in the flow from source to warehouse, responsible for extracting data from distributed heterogeneous data source to a temporary intermediate layer first, then clean, transform, integrate and load into the data warehouse or data mart, becoming the basis of online analytical processing and data mining. Therefore, the ETL process can be as a kind of data exchange solutions.Data warehouse is a process for data integration, processing and analysis of distributed business data in enterprise, rather than a product can be purchased, therefore, it has a variety of ways, such as hand-coding or ETL tools. Lowering project risk, shorten the construction period and reduce difficulty of development and maintenance to consider, ETL tools have more obvious advantages, so this paper uses a open source ETL tool - Kettle. Compared to other commercial ETL products, Kettle has several features, such as lighter, faster, and high scalability, make up for flexible defects of ETL tools to hand-coding.After introducing the project background, the use of the technology and tools, I analyze the requirement of data exchange in infrastructure MIS, including the content of data exchange, the types, the scenes and so on, decompose the complex task through the ways of classifying, grasp the issues to be resolved overall. Based on it, I put forward the design of data exchange, decompose the process "plant -> group" into three sections: "power plant -> power plant data exchange node", "plant data exchange node -> group data exchange node", "group data exchange node -> group", in which, first section and third section belong to the data exchange of local network, the second section transmis across regions, use VPN to ensure data security.According the architecture, I test and contrast several software products in project implementation, record their features which meet the architecture, analyze their strengths and weaknesses, demonstrate the rationality of ETL program. Then I disscuss the realization of data exchange in detail, divide the task into two types: database records exchange and file transfer. For the exchange of database records, introduced by different operating environment: in LAN, use the object interfaces provided by Kettle, export the records from source database of plant to the memory of exchange node, and then import them to the database of node, analyze and improve the source code of Kettle, use the time-control object to achieve timing and real-time start, capture the change of records in source by triggers; When transmis across regions, I need to write specific VPN connection script for different power plants, for example, use command to call VPN client software directly, or write keyboard robot program to simulate log in on webpage. After the establishment of the VPN access, we can operate Kettle to transmis records as in LAN, modify the structure of tables to avoid conflicting in data integration, and improve the approach of real-time incremental exchange. For file transfer, combine Kettle and FTP server, design a structure that can minimize deployment of FTP server, write the program to download files from the FTP server or upload to the server, put forward my own solution for some questions as transmission of nested folders, resume broken transfer and so on.Innovation of this paper, firstly, is breaking the traditional restrictions on data exchange technology, using the current epidemic data warehouse technology to achieve the exchange and integration, facilitate further processing of data processing; secondly, improve the structure of the traditional ETL data integration approach, add data exchange nodes between source database and target warehouse, the nodes can be played in the transmission buffer, and improve safety and controllability of the exchange process, at the same time, separate the exchange system from the business system, reduce the affect to business system when records are exchanging.As the scale and complexity of the power system, combined with the project has just started, my work is in relatively basic level, the architecture is simple, and all the solutions belong to a small part of the whole project, like the tip of an iceberg. However, the proposal, as well as the test, deployment and running by the proposal, meets the requirement of data exchange basically, proves the rationality of the structure, and provides research and application value for improvement of data exchange platform.
Keywords/Search Tags:data exchange, ETL, real-time increment, data integration
PDF Full Text Request
Related items