Font Size: a A A

The Architecture Of Rule-based Data Quality Management System And Some Key Issues Research

Posted on:2010-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:T W LeiFull Text:PDF
GTID:2178360278473161Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Recently,more and more people pay attention to their data quality issues.In order to maximize their data's value,many enterprises and organizations had take managerial or technical steps to enhance their data quality level.But the academic research on it was not sufficient and that can directly usable data quality management systems in IT industry was poor yet.In this paper,we designed a data quality management system that named as RDQMS,which can collect quality information,auditing data based on data quality rules and repair off normal data automatically.It can be used to found and resolve data quality issues persistently,.It has good utility and some intelligence.Here we also discussed three key technical problems,which were data quality rule base building, data quality issues auditing and data repairing.Illumined from GP,we give a new method which using a tree to express all of the data quality rules,we named it as q-ET.All of the rules are stored in xml files,which can put q-ET in it without change.Here we also gave some new easy algorithms to mining rules from sample data,they were working good on RDB where stored a mass of data.In data auditing part,we discussed the algorithm of the data auditing that based on the q-ET.Here we first change the q-ET to its reverse q-ET and then we change it to SQL sentences on relational data base to execute it.If the data source is xml type,we change it to XQuery sentences and execute it.How to repair off-normal data and null data was a complex problem.Here we repair the data directly use the data quality rules described in q-ET.As of good expression ability of q-ET,the data repairing algorithm has good usability.
Keywords/Search Tags:data quality management, data auditing, data repairing, data mining
PDF Full Text Request
Related items