Font Size: a A A

Spatial data analysis of networked infrastructure failure data: An application for condition assessment of drinking water distribution systems

Posted on:2011-04-18Degree:Ph.DType:Dissertation
University:Carnegie Mellon UniversityCandidate:de Oliveira, Daniel PinhoFull Text:PDF
GTID:1448390002463268Subject:Engineering
Abstract/Summary:
This dissertation addresses the exploratory spatial data analysis of failure data of physical infrastructure networks. By exploiting spatial information embedded in data, such analysis aims at generating information that is useful to assess the physical condition of such systems, e.g., detecting locations with abnormally high breakage. By addressing specific locations in the network rather than certain generic groups of pipes, spatial analysis allows linking populations at risk to the condition of assets and eventually improves prioritization of capital investments. Analyzing the physical condition of infrastructure assets is relevant because many critical civil infrastructure systems in America are in poor condition, which creates risks to the served population and creates high capital investment needs. Time-based approaches already provide means to model deterioration and this research adds to condition assessment research efforts by exploring a complementary perspective of failure phenomena. The contributions of this research include a framework that guides the spatial data analysis of failure data, and two data analysis methods, which are extensions of existing methods to the case in which data points are constrained by a network space. The proposed framework consists of four main parts. The first, addresses failure related data collection, preparation, and integration. The second part addresses the generation of local indicators of breakage density by using a proposed density-based spatial clustering approach. Local indicators are the number of failures over the length of pipes in specific regions. The third part proposes spatial scan statistic for points constrained by a network to detect regions with higher than expected breakage density. The last part aims at finding factors associated with the regions with abnormally high breakage density by using classification models. Comprehensible classification models are used to predict a region label of failure data by taking a set of attributes that account for environmental factors, network properties, and pipe characteristics. The application of the proposed steps and methods to a real world drinking water distribution failure dataset allowed the assessment of the two aforementioned methods and to make conclusions about this specific dataset about water pipe breakage. The density-based spatial clustering was able to effectively identify clusters of breaks that are used to derive local indicators of breakage. Although clustering results vary significantly depending on the choice of parameters, a cluster quality assessment demonstrated that there is a convergence to relatively stable clustering results for certain range of parameter values. The spatial scan statistics identified regions with abnormally high breakage density. A set of six abnormal regions are present even after accounting for age, which is, commonly, one parameter accounting for high breakage rates in pipes. The classification models applied to the real world data in the last part presented an overall high performance even when considering the small size of the dataset and the small proportion of data in the abnormal regions. Most likely, classifiers exploited the strong autocorrelation in the attributes to partition the dataset and classify instances. This is probably also the reason why approaches expected to improve performance, such as over-sampling the minority abnormal set of breaks, and reducing the feature set by means of attribute selection approaches, provided no improvements. Binary classifiers, i.e., those that predicted the label of each cluster separately against all the other points in the dataset, performed generally better than multiclass classification, i.e., the simultaneous prediction of all region labels against each other and the set of remaining points. The obtained results are encouraging regarding the usefulness of spatial analysis as a means to extract interesting information from data about the physical condition of infrastructure systems. This work can be extended in a number of ways including the spatial analysis of other data types, such as data generated by real-time monitoring systems; the integrated time space analysis of abnormal patterns; and the use of local indicators to perform prioritization of capital investments.
Keywords/Search Tags:Data, Infrastructure, Network, Condition, Local indicators, Assessment, Systems, Abnormally high breakage
Related items