Font Size: a A A

Cleaning Framework for Big Dat

Posted on:2018-09-03Degree:Ph.DType:Dissertation
University:Oklahoma State UniversityCandidate:Liu, HongFull Text:PDF
GTID:1478390020955842Subject:Computer Science
Abstract/Summary:
With the advent of big data, more and more data is being generated every day and the data is one of the most valuable assets of any data-related system. The quality of data has received significant attention over the years, since the value of data relies heavily on its quality and usability. Proper use of high-quality data can help people make better predictions, analyses and decisions. Real-world data often contains numerous non-trivial problems which prevent it being used directly, and these problems make data cleaning necessary. Our research harnesses both context and usage patterns of data to identify the same object and link associated objects so that data can be cleaned through association. Our proposed cleaning framework consists of an identification and linkage process, association and repairing process, a ranking process that provides priority based on the severity of data issues, and a mechanism to answer big data queries by accessing small clean data. Our study shows that the proposed approach is able to associate datasets which can benefit the repairing process. Data quality improves significantly by using our approach and the efficiency remains stable.
Keywords/Search Tags:Data, Cleaning, Process
Related items