Font Size: a A A

Research And Implementation Of Multi-source And Heterogeneous Data Governance Platform

Posted on:2020-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y XiaoFull Text:PDF
GTID:2428330572973613Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of information technology,the scale of information systems has exploded,showing the large amount of data,diversified data types and sources,relatively low data value density,fast data growth,and low data accuracy and credibility.In order to deal with large-scale data,information systems usually adopt a distributed architecture,which leads to multi-source and heterogeneous data.That is,data is usually distributed among multiple data sources.The types,structures,implementations,versions,and The deployment environment is different.Multi-source heterogeneous data brings great challenges to understanding information systems,iterating and maintaining information systems,and using data.To solve these challenges,this paper proposes a multi-source heterogeneous data management platform based on metadata.This topic aims to design and implement a multi-source heterogeneous data governance platform.Apply distributed data technology and rule engine technology to manage multi-source and heterogeneous data from the perspectives of metadata,data integration and data quality management.Compared with the existing data management platform,the system has the ability to process large-scale heterogeneous data sources,effectively combine data and metadata,has a friendly access interface,and more importantly,the platform is engaged in data quality management.It has the ability to handle data source changes and to manage multiple data sources at the same time.In order to realize this platform,this paper first consults the literature to investigate related technologies and clarify the technical feasibility of the system.Then the requirements of the system were analyzed in strict accordance with the requirements of software engineering,and the key technical problems of the system were analyzed.In order to make the results of data analysis more reliable,the system first preprocesses the original distributed system log,and proposes a trace log based metadata harvesting method,this method improves the accuracy and efficiency of metadata discovery.Reduce the cost of metadata discovery.In the process of data validation for multi-source heterogeneous data,it is found that when data is distributed among multiple data sources,a single data source only has local information,and it is difficult to ensure the correctness and consistency.After a large number of experiments,this paper proposes a multi-source heterogeneous data validation scheme based on metadata.This solution to the above two key problems has been verified by a large number of experiments.The solution of key technologies has cleared the obstacles for the construction of the system.This paper has carried out the overall design and detailed design of the system around the key technologies,and built a multi-source heterogeneous data management system with complete functions,and then designed and realized user-friendly visualization.interface.Finally,this paper summarizes the completed system and puts forward the deficiencies and the direction of improvement.
Keywords/Search Tags:data governance, metadata management, data validation, system design
PDF Full Text Request
Related items