Font Size: a A A

Data mining approches to complex environmental problems

Posted on:2008-05-26Degree:Ph.DType:Dissertation
University:University of Illinois at Urbana-ChampaignCandidate:Hill, David JFull Text:PDF
GTID:1448390005450807Subject:Environmental Sciences
Abstract/Summary:
Understanding and predicting the behavior of large-scale environmental systems is necessary for addressing many challenging problems of environmental interest. Unfortunately, the challenge of scaling predictive models, as well as the difficulty of parameterizing these models, makes it difficult to apply them to large-scale systems. This research addresses these issues through the use of data mining. Specifically, this dissertation addresses two problems: upscaling models of solute transport in porous media and detecting anomalies in streaming environmental data.; Upscaling refers to the creation of models that do not need to explicitly resolve all scales of system heterogeneity. Upscaled models require significantly fewer computational resources than do models that resolve small-scale heterogeneity. This research develops an upscaling method based on genetic programming (GP), which facilitates both the GP search and the implementation of the resulting models, and demonstrates its use and efficacy through a case study.; Anomaly detection is the task of identifying data that deviate from historical patterns. It has many practical applications, such as data quality assurance and control (QA/QC), focused data collection, and event detection. The second portion of this dissertation develops a suite of data-driven anomaly detection methods, based on autoregressive data-driven models (e.g. artificial neural networks) and dynamic Bayesian network (DBN) models of the sensor data stream. All of the developed methods perform fast, incremental evaluation of data as it becomes available; scale to large quantities of data; and require no a priori information, regarding process variables or types of anomalies that may be encountered. Furthermore, the methods can be easily deployed on large heterogeneous sensor networks. The anomaly detection methods are then applied to a sensor network located in Corpus Christi Bay, Texas, and their abilities to identify both real and synthetic anomalies in meteorological data are compared. Results of these case studies indicate that DBN-based detectors, using either robust Kalman filtering or Rao-Blackwellized particle filtering, are most suitable for the Corpus Christi meteorological data.
Keywords/Search Tags:Data, Environmental, Models
Related items