Font Size: a A A

Research On Algorithms Of Big Data's Consistency Quality Analysis

Posted on:2020-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z R WangFull Text:PDF
GTID:2428330590474459Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data quality management and analysis have always been an important research direction in the field of big data.The quality of data determines whether the value of data can be fully utilized.Conditional Functional Dependencies(CFD)is a recently proposed tool for dealing with data quality problems based on rules.It has received extensive attention in the academic community in recent years.CFD is a rule that constrains the consistency of data.With CFD,data errors can be effectively found and fixed.At present,there are many research work on CFD in academia.However,whether it is using CFD for data quality detection or CFD discovery from existing data sets,existing research works focus little on detailed performance analysis and optmization.Thus,existing algorithms can not be applied to the massive data analysis process in many cases.This paper is oriented to massive data's CFD analysis in real scenarios.We propose a streaming framework for CFD detection tasks,and two optimized data structures to further imporve algorithms' performance.In addition,this paper studies the work of CFD discovery.Based on existing research works,this paper presents a sampling-based sublinear acceleration method,which can be applied to real big data scenarios.Finally,this paper uses experiments to test the new algorithms' performance.The results show that our works have positive influence on reducing both CFD detection and discovery's time cost.
Keywords/Search Tags:Conditioinal Functional Dependencies, data inconsistency, data quality
PDF Full Text Request
Related items