Font Size: a A A

High performance spatial data mining: Scalable methods for spatial autoregression

Posted on:2006-04-11Degree:Ph.DType:Thesis
University:University of MinnesotaCandidate:Kazar, Baris MustafaFull Text:PDF
GTID:2458390008971909Subject:Engineering
Abstract/Summary:
Explosive growth in the size of spatial databases has highlighted the need for spatial data mining techniques to mine the interesting but implicit spatial patterns within these large databases. This thesis deals with reducing the computational complexity of the exact and approximate spatial autoregression (SAR) model solutions. Estimation of the parameters of the SAR model using Maximum Likelihood (ML) theory is computationally very expensive because of the need to compute the logarithm of the determinant (log-det) of a large matrix in the log-likelihood function.; The first part of the thesis introduces theory on SAR model solutions. The second part applies parallel processing techniques to the exact SAR model solutions. We proposed parallel formulations of the SAR model parameter estimation procedure based on ML theory using data parallelism with load-balancing techniques.; Although this parallel implementation showed scalability up to eight processors, the exact SAR model solution still suffers from high computational complexity and memory requirements. These limitations have led us to investigate serial and parallel approximate solutions for SAR model parameter estimation. In the third part of the thesis, we present two candidate approximate-semi-sparse solutions of the SAR model based on Taylor's Series expansion and Chebyshev Polynomials. We showed that the differences between exact and approximate SAR parameter estimates have no significant effect on the prediction accuracy.; We developed a new ML based approximate SAR model solution and its variants in the next part of the thesis. The new approximate SAR model solution is called the Gauss-Lanczos approximated SAR model solution. We algebraically rank the error of the Chebyshev Polynomial approximation, Taylor's Series approximation and the Gauss-Lanczos approximation to the solution of the SAR model and its variants. In other words, we established a novel relationship between the error in the log-det term; which is the approximated term in the concentrated log-likelihood function and the error in estimating the SAR parameter rho for all of the approximate model solutions.; In the last part of the thesis, we present a faster, scalable and novel prediction and estimation technique for the exact SAR model solution (NORTHSTAR). We provide a proof of the correctness of ML based SAR model solutions by showing the objective function to be uni-modular.
Keywords/Search Tags:SAR model, Spatial, Data
Related items