Font Size: a A A

Research And Implementation Of Disruptor-based High Performance ETL System

Posted on:2019-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y X WenFull Text:PDF
GTID:2428330545973712Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the amount of processing of multi-source heterogeneous data increases rapidly in various industrial applications,how to integrate these data effectively is the core issue in data applications.However,existing data integration tools have shortcomings in extensibility,performance,and functionality,it is difficult to cope with the ever-changing data integration needs.For the above problems,this paper based on Disruptor queue technology to design and implement an extensible ETL system which has good performance.At first,this paper studies and designs a "Framework + Plugin" ETL system architecture based on plug-in concept and completed the design of system function modules and task execution flow.This architecture has good extensibility,it can provide specific adaptations for different data sources and so it can solve the multi-source and heterogeneous problem well.Secondly,this paper has carried on further research to the performance optimization of ETL system.First,we made plenty of research and verification on the implementation method of the data buffer in the“Producer-Consumer" model,found out the shortcomings of the use of blocking queue technology in traditional ETL tools,and we optimized the data buffer based on the Disruptor technology.Secondly,it adopts a multi-thread parallel scheduling strategy to design the work flow.Thirdly,this paper optimized the real-time extraction function in traditional ETL tools,designed a near real-time extraction method which can dynamically adjust the frequency.Finally,this article describes the specific implementation of the ETL system in detail,shows the Web UI and the implementation method of plugins,and we verified performance of this ETL system through functional testing,performance testing and performance tuning.The testing results proves that the improved real-time extraction method can implement near-real time data synchronization in seconds level and can dynamically adjust the frequency well.The performance testing proves that applying Disruptor and parallel strategy indeed made help for improving the performance of the ETL system,which is more better than Kettle.
Keywords/Search Tags:ETL, Plugin-in style design, Disruptor, Near real-time extraction, High performance
PDF Full Text Request
Related items