Font Size: a A A

Design And Implementation Of Data Transform Platform Based On Log Query

Posted on:2023-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:W XieFull Text:PDF
GTID:2558306911481164Subject:Engineering
Abstract/Summary:PDF Full Text Request
Information is an important resource for the development of modern information technology and the basis for scientific management and decision analysis.Currently in the era of digital transformation,there is a huge market demand for the digital transformation of traditional enterprises.Due to the existence of data silos such as non-integration of services,noninteroperability of processes,and non-sharing of data among various business systems,the digital transformation of traditional enterprises is hindered.In order to cope with this situation,ETL(Extract-Transform-Load)technology integrates scattered,messy,and nonstandard data together for the construction of data warehouse.The functions of the existing ETL platform are relatively simple and cannot meet the needs of enterprises for rich application scenarios such as batch processing,real-time processing and incremental synchronization,wherefore it hinders the process of digital construction of enterprises.In order to solve the above problems,a data transformation platform based on log query is designed and implemented.The platform applies the Changed Data Capture(CDC)technology based on log query,and adopts the Debezium incremental synchronization framework.Based on the Flink real-time processing engine and the Kafka message middleware,a set of data conversion platforms integrating batch processing,real-time processing and incremental synchronization are pre-designed and implemented to meet the requirements of the client for the usage scenarios of the data conversion platform.In addition,the platform also has functions including user management,data expansion,resource management,and task management.The work of the paper is summarized as follows:According to the specific production environment of the entrusting party,the demand analysis is carried out.The functional modules of the data conversion platform is clarified,and the system boundary analysis and data interaction analysis of the platform is completed.Based on the analysis of the overall requirements of the system,the four modules to be completed by the platform,including system management module,data conversion module,data expansion module and resource management module,are described in detail through UML modeling.Finally,according to the actual production requirements of the client,the non-functional requirements of the system are described.The overall architecture design and detailed functional module decomposition of the data conversion platform based on log query are carried out,and the structural design and dynamic cooperation relationship of the system are displayed by class diagrams and sequence diagrams.An improved Isolated Forest algorithm is given to solve the problem of anomaly detection in original data.Use the Spring Boot framework to complete the construction and support of the entire business system.For incremental synchronization scenarios,the system implements a log query-based incremental extraction framework based on Debezium.Based on Flink and Kafka,a set of data conversion modules that support stream-batch integration is implemented,and it is divided into three independent subsystems:data extraction,data processing,and data loading,which greatly improves the flexibility and scalability of the system.Design and implement a data expansion module to meet user requirements for data expansion scenarios such as data interface metadata management.Finally,the business entity attributes are mapped to the entity relationship diagram,and the physical table structure of the database is designed and implemented.After completing the platform coding,design the test case according to the requirement analysis,and deploy the system in the actual production environment of the client.Validate system functional and non-functional requirements according to test cases,and analyze test results.System testing has verified that this platform meets the system functional and non-functional requirements,and runs well in the actual production environment,and has been delivered for use.To sum up,the data conversion platform based on log query promotes the construction of traditional enterprise data warehouse and the process of digital transformation,and provides efficient data conversion services for enterprises.
Keywords/Search Tags:CDC, Flink, Kafka, Data Transform, Debezium
PDF Full Text Request
Related items