Font Size: a A A

Research On Distributed Da Meng Data Exchange Platform(DMETL) Based On ETL Partitioning

Posted on:2016-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:P XiangFull Text:PDF
GTID:2348330479953413Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays, data structure become more and more complex and its distribution become more widely, which promoted the popularity of distributed ETL tools. DaMeng Data Exchange Platform(DMETL) is a high-performance streaming ETL tool which supports distributed execution of ETL with its cluster feature. To distributed ETL tools, how to optimize the execution process of ETL is a problem that worthy of study. At present, one optimization strategy is partitioning the ETL activities to improve concurrency, which is relatively limited and has high network overhead. However, with reasonable partitioning of ETL workflow, network overhead can be reduced and resource utilization can be improved, so it's make sense to study how to partition ETL workflow based on DMETL.Firstly, describe the overall structure of DMETL, the process of execution of ETL workflow, and the core module associated with partition of ETL. Metadata module can get metadata of ETL from database. Engine module can resolve ETL workflow and partition it. Service Listening module can be used to execute sub ETL workflow remotely. Cluster module can provide protection for distributed executionSecondly, based on the description above, research partition of ETL activities and partition of ETL workflow. Then achieve several partitioning strategy to partition ETL activities and turn workflow partition problem into tree partition problem for conveniently solving. Finally, design DMETL partition strategy by combine both of them. Additionally, design appropriate scheduling policy to improve the efficiency of executing sub ETL workflow and improve buffer in ETL activities for high performance at high concurrency scenarios.Finally, the experiments show that the system is designed to achieve the partitioning strategy, and the efficiency has improved relative to traditional task partitioning strategy, besides, the buffer's performance is significantly improved at high concurrency scenarios.
Keywords/Search Tags:Extraction-Transformation-Loading, Workflow, Partition
PDF Full Text Request
Related items