Font Size: a A A

Generalized local learning in water resource management

Posted on:2006-03-05Degree:Ph.DType:Dissertation
University:Utah State UniversityCandidate:Pande, SaketFull Text:PDF
GTID:1458390008467473Subject:Engineering
Abstract/Summary:
Recently, uncertainty analysis has become a central theme to several research efforts in water resources. Independent of the modeling paradigm that one may use, performance of any modeling task is severely hindered by the randomness in data and our inadequacy to imitate reality. Given the scientific limitation that we may never be able to completely identify the underlying physical behavior(s) of a system of interest, the question is how well can we accommodate data uncertainty under this constraint. One means would be to isolate uncertainty due to data sampling from its underlying but unknown distribution or to approximate the underlying distribution from the given data. This dissertation is dedicated to modeling a physical system while considering data uncertainty. We here choose a local learning paradigm to model the behavior of Sevier River/Piute Canal in southern Utah based on conservation of mass. Critical to this work, however, is the concept of "generalization." Defined as the performance over unseen data, a model is said to "generalize" well if it is capable of performing well over future scenarios. Such ability is realized by wise a selection of parameters to identify our model. In order to do so, it is essential that the selection be based on a measure that is independent of the data but dependent on the underlying distribution. This dissertation, using Hoeffding bounds and bounds due to Vapnik-Chervonenkis (VC) generalization theory, performs model selection such that the expected measure (over the underlying but unknown distribution) is approximated as closely as possible while ensuring that the selected model also performs well on the given data itself (not the future data). Due to the "locality" property of our model, what also comes out of this research is the detection of non-stationarity in physical systems when the model has good generalization ability. For our modeling paradigm, the number of predictors that are used, the bandwidth parameter that defines the locality, and the number of lags (only in case of time series modeling using local learners) completely define the model space. In this work we show: (a) how to select predictor subsets based on Hoeffding bounds, (b) that such a selected predictor subset allows us to detect non-stationarity in a conservation of mass problem, (c) how the complexity of local learners (specifically, the nearest neighbor method) varies with different combinations of the number of lags and the extent of locality, and finally, due to VC theory, (d) how nearest neighbor methods can be identified that generalize well.
Keywords/Search Tags:Local, Model, Data, Uncertainty, Due
Related items