Font Size: a A A

Research On Technology Management Data Integration Technology Based On ETL

Posted on:2020-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:C Y XuFull Text:PDF
GTID:2428330575467962Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the context of the rapid development of Internet technology,as information technology is deeply built within various industries,the role of data becomes more and more important.The value of the data also needs to be deeply explored and shared.The distributed heterogeneous data needs to be unified and integrated to form a unified data sharing platform.Value sharing between data can be achieved by integrating data from disparate applications.ETL(Extract-Transform-Load)is a good solution to support data integration related services and has become a research hotspot in recent years.Based on the technology management data integration business,the paper studies the ETL-based technology management data integration technology.The data of science and technology management is distributed at various stages.The data of different stages are maintained by different departments.There are great differences in storage format and semantics.In order to standardize data storage and simplify data push,it is necessary to store the whole process of technology management data.However,in the process of technology management data integration,the following problems are encountered:1.The technology management data is complex and diverse,the data quality in the data integration process is difficult to guarantee,and the data loss and storage format inconsistency often occur,so effective technology is needed.The solution is to ensure the data quality after the integration of technology management data.2.On the basis of guaranteeing data quality,the existing ETL integrated task script scheduling scheme is less efficient and the core business data is updated slowly.Therefore,it is necessary to select a reasonable task scheduling scheme to effectively schedule the ETL integrated task script to improve Resource utilization and data integration efficiency.In order to solve the above problems,firstly,this paper designs a set of technology management data warehouse architecture.Through the hierarchical structure,it can clearly describe the flow of data and data application scenarios,and can standardize the unified storage technology management data.Secondly,based on the traditional data integration model,a data integration model based on meta-model control is designed,and corresponding metadata description and mapping rules are proposed to assist data integration.Through the combination of extraction,transformation,loading metamodel and mapping rules,the data integration model is improved,and the corresponding metadata management tools and mapping parsers are developed.And embed data quality assurance methods into data integration scripts.After experimental verification,this model and the corresponding mapping algorithm can effectively guarantee the data quality after data integration.Then,a distributed ETL task scheduling framework and an integrated scheduling algorithm are designed.The framework is divided into three stages:ETL task preprocessing,ETL task scheduling allocation,and ETL task execution.The integrated scheduling algorithm is a general description of the three-stage algorithm.After experimental verification and analysis,the framework and algorithm can improve the ETL task scheduling allocation and execution process,improve the utilization of distributed environment resources,and improve the efficiency of data integration.Finally,the paper also implements a system which is called ETL task construction and scheduling system for technology management data,and applies the model framework and algorithm involved in this paper to the system.At present,the system has been tested in many rounds,and has been applied and verified in the science and technology management system of a national ministry,and has undertaken multi-volume data integration business,and it can complete data integration work efficiently and stably.
Keywords/Search Tags:Data warehouse, Data integration, ETL, Data quality, Task scheduling, Load balancing
PDF Full Text Request
Related items