Font Size: a A A

Study On Extraction And Transformation Based On Data Warehouse ETL

Posted on:2012-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2218330368978687Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data warehouse technology is developing rapidly in recent years as a datastorage and management technology, because the data warehouse technology has ahigh degree of data integration and a good deal of heterogeneous data and so on, itquickly gets a lot of industries'favor, it takes huge advanced decision analysis in thefield of decision analysis, and improves some defects of the traditional databasetechnology, and takes better integration, stability and the advantages of traditionaldatabase to data. The application of data warehouse technology to the field of decisionanalysis, can be very good to extract the valuable data from the vast amount of datainformation what is necessary for decision, through the data analysis, system can bemore efficient to make the appropriate decision. Therefore, it is necessary to use thetechnology of data warehouse in the field of decision analysis.This research of data extraction and data transformation technology is the keypart of building the data warehouse for obtaining original data and data processing.The data extraction and data transformation technology is based on the"Extract"and"Transform"of the ETL (Extract, Transform, Load) layer."Extract"is access to thedatabase systems and other external data sources, and from which data warehouseextracts data,"Transform"step transforms data which has different data sources from"Extract"step to format data. This research designs algorithm and the developmentflow chart of the data extraction and data transformation, and designs data extractionand data transformation function to three modules of accessing external data sourcemodule, data extracting module and data format transformation module. Accessingexternal data source module is connected to external data sources through theCDatabase base binding interface, and accesses the different data source through theunified data access interface, and realizes the external data source accessing. Dataextracting module uses the processing language technology and the calling interface technology to extract data from the data source, and obtains the necessary informationfrom the data source after accessing the data source, including table name, columnname and the details of table. Data format transformation module is based on VC ++technology and data warehouse technology to complete the extraction ofheterogeneous data structure transformation and transforms the heterogeneousoriginating data from data source to uniform format for the data warehouse, such asfloating point data, time data, and does data format conversion before data stores inthe data warehouse to give the data uniform data format to facilitate accessing data fordata warehouse. The achieving of these three function modules series of the datawarehouse and the external data sources, and supports to connect SQL, MDB, Oracleand other types of database systems, and completes the data warehouse's dataextraction and transformation function flows from the external data sources in order tofacilitate subsequent operations such as storage, query and analysis.This research of data extraction and data transformation technology can achievethe data warehouse's extracting the required data collection which is subject orientedfrom the application system which stores the original data in Windows environment,and processing data in order to adapt to the requirements of data warehouse, and ishelpful to processing of the core data and data analysis during the decision analysis ofbuilding data warehouse for enterprise, at the same time can improve the efficiency ofdecision analysis, and can be well applied to the application of the industry field ofdecision analysis which uses the data warehouse technology.
Keywords/Search Tags:Data Warehouse, ETL, Data Extraction, Data Transformation
PDF Full Text Request
Related items