Font Size: a A A

Classication and Regression Framework for Characterizing Contaminant Source Zone

Posted on:2016-10-06Degree:Ph.DType:Thesis
University:Tufts UniversityCandidate:Zhang, HaoFull Text:PDF
GTID:2478390017482396Subject:Computer Science
Abstract/Summary:
In this thesis we develop two machine-learning frameworks for estimating quantitative metrics characterizing subsurface zones of chemically contaminated soil focusing on problems involving Dense Non-Aqueous Phase Liquid (DNAPL). Source zone characterization, a necessary first step in the development of the remediation strategy, is challenging due to practical constraints associated with the data available for processing. We first propose a set of geometric features which are based on morphological image processing operations. These features are used for both the classification work in Chapter 3 and the regression approach developed in Chapter 4. Second, we propose a classification framework as our initial solution. Specifically, we quantize each metric into a number of intervals and employ machine learning methods to determine the interval containing the metric. A classification scheme based on an iterative algorithm of Linear Discriminant Analysis (LDA) and Spectral Clustering (SC) is used to determine feature-based clusters that are associated with metric intervals.;Furthermore, we propose a regression framework focusing on the use of manifold regression techniques. We use manifold methods for jointly representing labeled training data comprised of metrics as well as features. We then propose a new integrated approach to the problems of (a) robustly embedding test data into the manifold and (b) constructing a regression function for metrics estimation. The utility of the approach is enhanced by the explicit incorporation of a physical constraint associated with the metrics into the problem formulation. Results based upon simulated data using Sequential Gaussian Simulation (SGS) method demonstrate the potential effectiveness of the manifold regression approaches as well as significant improvement in performance relative to the case where the algorithmic components are designed serially. At last, we apply our manifold regression algorithms to a new simulated data set whose the hydraulic conductivity fields were built by Transition Probability Markov Chain (TP/MC) model. In TP/MC data the full concentration data are available for training, but the test data are sparsely sampled from 25 ports. The modifications of our manifold regression algorithms to process the sparse data are proposed and the results show the efficacy of our approaches.
Keywords/Search Tags:Regression, Data, Framework, Metrics, Propose
Related items