Font Size: a A A

Research On Parallel Scheduling Of ETL In Data Warehouse

Posted on:2016-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z M ZhangFull Text:PDF
GTID:2298330470450021Subject:Software engineering
Abstract/Summary:PDF Full Text Request
All along, the information is very important in people’s life, especially for modern enterprise, the accuracy and timeliness of information likely will decide the success or failure in business, In general, there are two ways to get information:one is the intuitive information such as the network news media, and the other is deduced from the analysis of relevant data information. The first way is direct and public, so often the focus of the competition advantages and falls on the second way. The second mode of access to information, to be without the construction of data warehouse. Data warehouse is an information platform, it from the enterprise internal business processing systems, or enterprise external environment to get the data, data organization is the star model and model snowflakes, sorting and restructuring, storage, so that for the BI analysis system (camp), data mart, or the use of data mining.Data warehouse construction is a project, in addition to the need to have a good database software and the corresponding data warehouse model, more important is to have accurate data sources and is in accordance with the model analysis of statistical work, determine which of the two key lies in the ETL and scheduling options. ETL, is the abbreviation of English Extract-Transform-Load, used to describe the data from the source end after extraction (Extract), transpose (Transform), the process of Load (Load) to the end; At present, there are a lot of mature ETL products on the market, if from vendors can be divided into two kinds, one kind is database vendors own ETL tools, such as Oracle warehouse builder, Oracle Data Integrator. Another kind is a third party tool providers, such as:Ascential Datastage, Informatica Powercenter, NCR Teradata company ETL Automation, Kettle, etc. These products have their own advantages and disadvantages, or tools all aspects are good, but the price does not poor. Maintenance, or the inconvenience function is weak, can not meet the ETL requirement, especially for job scheduling, difficult to achieve efficient workflow configuration.Therefore, based on the author’s years of experience in telecom and bank of ETL, with the most common on the market of ORACLE database platform, refer to ORACLE’s website documents, use ORACLE’s own DBMS_SCHEDULER outstanding features, such as scheduling and DBLink batch extract data in the most basic of PL/SQL language to develop the ETL and scheduling functions. The research results of the code in ORACLE10g version can be run directly deployed, is a simple operations, performance, efficient and extensible and no additional purchase of a third party and separate server small ETL tools.
Keywords/Search Tags:ORACLE, PL/SQL, ETL, Scheduling, Data Warehouse
PDF Full Text Request
Related items