Font Size: a A A

Design And Implementation Of Multi-source Data Quality Verification System Based On Change Data Capture Technology

Posted on:2022-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:P F ChengFull Text:PDF
GTID:2518306572497224Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Internet application has made rapid development in recent years,followed by massive amounts of data.The governance of these data has brought many challenges to enterprises.As an important part of data governance,data quality management is mainly used to ensure that the relevant data meet the expected usage goals.Effective data quality verification can avoid huge losses for enterprises.Research on data quality is causing more and more people's attention.Data quality verification system is an important guarantee for high-quality data.On the one hand,the traditional data quality verification system uses hard coding for verification rules,which has great limitations.Modifying the verification rules requires changing the source code and recompiling the deployment service.On the other hand,some systems use the method of mapping verification rules to SQL query statements,but this method can only support part of the relational database,and requires business personnel to master the professional SQL syntax.At the same time,there is a contradiction between system efficiency and warning timeliness.Analyzes the requirements of data quality verification system in detail,and designs and implements a multi-source data quality verification system based on change data capture technology.The system implements the soft coding of verification rules based on Avaitor expression execution engine,and the business personnel can create complex verification rules through the visual interactive interface.Compared with the implementation of SQL rule mapping,the matching logic of data and rules is implemented in the application layer to support the access of various mainstream database systems.Finally,aiming at the efficiency problem of the above implementation,the system uses the change data capture capability of Debezium(a distributed change data capture component)to realize the near real-time data quality verification,and can apply the data changes of the data source to the corresponding rules at the second level,and give the alarm notification.The final test results show that the system can meet the basic requirements of the data quality verification system,support mainstream data sources,and can achieve high system efficiency and meet the requirements of alarm timeliness.
Keywords/Search Tags:Data quality Expression execution engine, Change data capture, Debezium
PDF Full Text Request
Related items