Font Size: a A A

Research On Data Quality Management And Data Cleansing Technology

Posted on:2014-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:M J ChenFull Text:PDF
GTID:2248330398471960Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The core competence of company depends on intangible asset called information. Data, as information carrier, when is mined for useful information or applied on some area like product, high quality should be ensured. However, problems, such as incomplete, inconsistency, redundancy and error, do exist in data sets, which not only affect people’s judgment about development trends, but also cause economic losses. Therefore, it is of extremely vital practical significance to research how to clean data to improve quality in the fields of application services, information systems and project management. This paper is based on the subject of Research on Key Technologies of Safety Trusted Telecom-level Operation Supporting Architecture on Reproductive Health Services, mainly focuses on electronic records’quality problems, provides solutions and designs cleansing tools to help data administrators and managers to master data status and improve data quality.The primary contents of this thesis are follows. Firstly survey on status, works and standards of data quality at home and abroad including description of data quality, quality management methods and models, quality assessment, classify and make a summary of the different needs of quality tools. Secondly data cleansing technology is the basic method to control data quality, its realization is to identify and correct data by using statistical methods, data mining algorithms and semantic analysis technology. The paper studies used cleaning algorithms, including record anomaly detection and duplicate record detection of two categories. Thirdly, through studying and learning from the mature Six Sigma quality management theory, project business requirements and data characteristics, designed quality management processes and quality management framework for health checks, and as a result quality issues turned out to be business process control problems. Fourly, in the quality engineering outline design phase, the paper analysis the characteristics and the relationship of the data within every business process, identify quality problems, define requirements, and use data cleansing principle to develop a data cleansing strategy in line with business needs. Finally, on the basis of the research and the summary of development, operation and maintenance experience, we add data quality management function in the business logic layer to original data cleaning tool, and apply the tool on test data set to identify and depose defects to improve quality.Main contributions of the paper are summarized as follows. The paper proposes a data relationship based data quality management method, establishes the corresponding relationship between the business and the data model, identify quality problems, and use cleaning technology to deal with inconsistent data. According to research, it’s an effective method to clean data and control quality from the data model level.
Keywords/Search Tags:data quality, data cleansing, quality management model, data-model-based cleaning
PDF Full Text Request
Related items