| ETL (data extraction, transformation and loading) is a very significant part in constructing data warehouse. It is generally regarded as the foundation of the whole data warehouse system and decision support system. For those who are designing the data warehouse project, it has become an important question for their consideration—how to design an efficient ETL process.This thesis has analyzed some key techniques applied in the economic census information publishing system, including the data extraction technology of heterogeneous data sources, personalized data transformation technology and update additional technology. Firstly, through the analysis of data source, a new design scheme of parallel data extracting interface has been put forward in accordance with the heterogeneous data source, that is, the structured DBMS data types and unstructured file types. According to the actual situation of economic census, the combination of total quantity extraction and incremental extraction is determined as the extraction pattern. Secondly, to meet the demands of information publishing and provide users with various, flexible and combined ways of query, data needs specific format transformation. This thesis has put forward to add an intermediate layer ODS (Operational Data Store) between data source and data warehouse and make a research on the transformation strategies in ODS. Thirdly, data loading adopts the flexible timestamp as the way of additional and updated loading strategy. This method neither takes up too many system resources, nor affects the structure of the existing system tables, not even needs to add new development.The ETL strategy put forward in this thesis has been realized in the economic census information publishing system in Heilongjiang province and its validity has been proved through practical application. |