Font Size: a A A

Vacuum: Automated procedures for assessing and cleansing civil infrastructure data

Posted on:2003-12-17Degree:Ph.DType:Thesis
University:Carnegie Mellon UniversityCandidate:Buchheit, Rebecca BariFull Text:PDF
GTID:2468390011988379Subject:Engineering
Abstract/Summary:
Monitoring data are collected to measure the condition, environment, usage, and performance of civil infrastructure. High quality monitoring data are necessary for decision-support systems, design analysis, and research. However, little work has been done in the area of generic, automated data quality assessment and cleansing procedures.; My research objective was to develop an automated procedure to assess the quality of civil infrastructure data and to explore whether it is possible to effectively cleanse the data using the assessment results. I have developed a two-level data quality assessment procedure; part of this procedure is implemented in a prototype software program. In the first level of the procedure, several different data quality assessment methods are used in a voting scheme to identify concentrations of anomalies in aggregate data. In the second level, differences between anomalies and normal data at the individual data level are identified; combined with domain knowledge, these differences can be used to identify different types of errors, such as missing data and calibration errors. Then, the data can be cleansed effectively.; In my thesis, I present two case studies that use my data quality assessment procedure. In the first, traffic data from a weigh-in-motion scale, my procedure identified three types of errors: data missing from the right-hand lane, extraneous passenger vehicle data, and records in which two tailgating vehicles were combined into a single vehicle. In the second, usage data from a HVAC system, my procedure identified missing data. The weigh-in-motion data was cleansed using the guidelines in my thesis; in the resulting clean data sets, the number of anomalies was reduced by an order of magnitude.; I also developed a test bench to explore the sensitivity of the data quality assessment algorithms. The test bench introduces a known error into a clean, artificial data set and then evaluates how well each assessment method identifies the error. The results confirm that no single data quality assessment method can identify every type of common error found in monitoring data; however, if the methods are used in a voting scheme, it is possible to identify every type of common error in monitoring data.
Keywords/Search Tags:Civil infrastructure, Monitoring data, Procedure, Data quality assessment, Identify every type, Voting scheme, Common error, Automated
Related items