Font Size: a A A

Research And Application Of Data Quality Quantitative Analysis

Posted on:2020-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:J L ZhuangFull Text:PDF
GTID:2428330596498354Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the explosive growth of data,due to factors such as network and hardware and software failures,data may be stored incorrectly or missing.These missing data and erroneous data may lead to deviations in the results of data analysis and data mining,and even large decision errors.Therefore,data quality issues are valued by more and more experts and scholars.At present,academia and industry have proposed a number of methods for data quality research,but there is still a lack of customized,domain-specific data quality quantitative research methods.The background of this paper is that an organization needs a customized,domain-specific data quality quantitative analysis and evaluation platform.The main research contents and innovations of the thesis are as follows:1.According to the project requirements,this paper builds a multi-dimensional quantifiable data quality evaluation model based on GB/T series of national standards combined with the author's internship experience in the organization,through research,analysis and inquiry of relevant expert opinion tailoring indicators.2.Through the research,we find that the methods for calculating index weights in the quantitative quality evaluation model include Delphi method,analytic hierarchy process,defect deduction method,cloud model method and entropy weight method.In practice,one of the methods is generally used.This single strategy may lead to excessive subjectivity of weights.In response to such problems,the article combines the Delphi method,the analytic hierarchy process and the entropy weight coefficient method based on information entropy to calculate the comprehensive weight.On the one hand,it compensates the subjectivity of the calculation weight of a single method.On the other hand,applying the entropy weight coefficient method based on information entropy can eliminate the influence of human factors on calculating the weight of each index as much as possible,so that the weight is further objective and accurate.In addition,in view of the problem that the judgement matrix of analytic hierarchy process does not satisfy the consistency after calculation,it is necessary to reconstruct the judgement matrix,which leads to high cost.In this paper,the induction matrix correction method is introduced to correct the judgement matrix so as to avoid the occurrence of this problem as far as possible.3.Based on the above-mentioned construction evaluation model and improved index weight determination method,this paper designs and develops a multi-module data quality evaluation system based on B/S architecture,and optimizes the system.In order to reduce the coupling of the platform,the front-end and the back-end are separated.The front-end uses the Vue framework and the back-end uses the Spring Boot framework.In order to solve the problem of browser same origin policy,the system uses Nginx reverse proxy to achieve cross-domain access.When calculating multiple indicators simultaneously,single-thread execution is inefficient,CPU utilization is low,multi-thread creation and destruction time-consuming problems,the system uses thread pool technology to achieve parallel efficient calculation.In addition,the system built a Zookeeper cluster to manage and coordinate the Kafka cluster,which decoupled the scoring calculation from mail delivery and made the system have the characteristics of HA.4.This paper uses the system to carry out comprehensive evaluation and analysis of data quality in the dataset of the e-commerce field,and visualizes the evaluation process and results through ECharts,and generate a detailed data quality evaluation report and send it to the evaluators by mail combined with Thymeleaf.It verifies the availability and efficiency of the constructed data quality quantitative evaluation framework,model and its corresponding system,which can meet the actual functional requirements of the enterprise.
Keywords/Search Tags:data quality, evaluation model, index weight, analytic hierarchy process, entropy weight coefficient method
PDF Full Text Request
Related items