Font Size: a A A

Regional correlation and regression analysis frameworks for geo-referenced datasets

Posted on:2010-12-25Degree:Ph.DType:Dissertation
University:University of HoustonCandidate:Celepcikay, Oner UlviFull Text:PDF
GTID:1449390002484556Subject:Computer Science
Abstract/Summary:
Geo-referenced datasets are generated at quickly increasing rates, creating the need to develop tools that extract knowledge from such datasets automatically. Traditional data mining techniques focus mostly on finding global patterns and lack the ability to systematically discover regional patterns. Finding interesting regional patterns is important because many patterns only exist at a regional level but not a global level. One of the main challenges in identifying such relationships is to discover regions that are interesting to domain experts and are capable of revealing such patterns.;Moreover, as a side product, a generic similarity measure for assessing the structural similarity between regions is proposed. Second, we propose a regional regression framework, called REG2, that discovers regional regression functions that are associated with contiguous areas in the subspace of the spatial attributes which we call regions. Third, in order to evaluate our proposed regional regression method and other geo-regression methods we propose various prediction evaluation measures capable of accurately assessing the performance of these techniques.;Moreover, we developed several plug-in fitness functions that employ PCA, MC, regularization, and example weighting to improve capability of uncovering the underlying structure of data without assuming predetermined boundaries, such as zip codes or grids. The proposed frameworks are evaluated in case studies that center on indentifying causes of arsenic contamination in Texas water wells and on Boston Housing dataset determining spatially varying effects of house properties on house prices.;The extensive experimental results show that our framework can effectively and efficiently identify regions with strong relations between dependent and independent variables, along with the regional regression functions which capture the spatial variation of attributes better than global models and other geo-regression models in building better models for prediction. We also show that besides providing better prediction, the discovered regions provided more insight into relations between variables. Finally we also show that using different evaluation measures based on the need of the domain experts can improve the prediction capabilities.;This dissertation focuses on developing methods to uncover hidden correlational patterns and developing regional regression tools that capture spatially varying relationships among attributes. First, we introduce a novel, PCA-based approach to discover interesting regions along with regional correlation patterns that exhibit strong relationships in the attribute space.
Keywords/Search Tags:Regional, Regression, Patterns, Regions
Related items