Font Size: a A A

Design And Implementation Of Distributed Data Integration Platform

Posted on:2022-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:X R XueFull Text:PDF
GTID:2518306605970809Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In the past ten years,with the maturity of cloud computing and data intelligence technology,relying on data accumulated in enterprise production,intelligent decision has played a guiding role.Usually the magnitude of data used by decision is around gazillion.Due to in various application scenarios,data storage technology,and opacity,information presents a phenomenon of "data islands".How to gather data with different sources,different formats and similar characteristics logically or physically to form standardization,uniformity,and data storage set centralization have become the main problems.Most of the data integration tools are Client-Server architecture,which has problems such as low data conversion rate,inability to cluster management and inability to support breakpoint continuation.According to the problems,the thesis mainly analyzes the process of enterprise data integration and proposes the solutions to the low data conversion efficiency,breakpoint continuation and high system availability problems.The specific work of the thesis is design and implementation of the distribution Data integration platform.The work done in thesis is as follows:(1)Demand analysis of distributed data integration systems: Firstly thesis Analyzes the data format,distribution of enterprise data and the application scenarios of enterprise data.Then,thesis propose a clear system demand analysis.The requirements of the distributed data integration system are divided into functional requirements and non-functional requirements.The functional requirements are the requirements of the system to provide application services.The non-functional requirement is to ensure the stability of the system.(2)Design and implementation of distributed data integration system: The design and implementation of the system is based on the analysis of requirements and complete each requirement module of the system.The thesis introduces the main work in three aspects:system architecture design,system database design and system function design.First,the distributed coordination service framework Zoo Keeper is introduced into the system to complete the work of distributed cluster management and system resource schedule.In terms of function realization,the open source data conversion framework Kettle is redeveloped to realize the conversion process of complex data.But,there are some defects in Kettle.Such as breakpoint continuation,online node expansion and cluster resource viewing.Secondly,in the design of the system database,the type of data and the relationship between the data are introduced in the form of ER diagram and chart description.Finally,the detailed design of each function module use system class diagrams,sequence diagrams and schematic diagrams to describe.(3)Distributed data integration system testing: At the end of the thesis,the testing of the system is introduced.First of all,build the hardware and software environment for the test.Then,the functional modules and non-functional modules of the system are described in the form of use case description tables and result graphs.Finally,according to the test results,summarize and analyze whether the system meets the expected requirements and the aspects of the system that need to be improved.After testing in all aspects of the system,the distributed data integration system can provide services normally and stably.The interface is displayed normally,the response time of each functional task has reached the expectations,and the platform has sufficient functionality and robustness to meet the expected needs of the enterprise’s data warehouse.
Keywords/Search Tags:Data Sharing, Data Integration, ETL, Zookeeper
PDF Full Text Request
Related items