Font Size: a A A

Human-machine collaboration for geographic knowledge discovery with high-dimensional clustering

Posted on:2004-12-23Degree:Ph.DType:Dissertation
University:The Pennsylvania State UniversityCandidate:Guo, DianshengFull Text:PDF
GTID:1468390011977036Subject:Geography
Abstract/Summary:
Increasingly large (i.e., having a large number of observations) and high-dimensional (i.e., having many attributes) geographic data are being collected, but the spatial data analysis capabilities currently available have not kept up with the need for deriving meaningful information from these datasets. It is critical to develop new techniques to efficiently and effectively assist in analyzing current large and high-dimensional geographic datasets and addressing complex geographic problems, e.g., global change, socio-demographic factors for epidemiology, etc.; The goal of the reported research is to develop a geographic knowledge discovery environment and an integrated suite of efficient and effective data mining techniques for exploring novel, complex spatial patterns in large and high-dimensional geographic datasets. As the first step, the reported research focuses on interactive, hierarchical, multivariate spatial clustering.; The major contribution of the research is twofold. First, the research develops three novel approaches for spatial clustering, feature selection and multivariate clustering, and several visualization techniques to support visual exploration and human interactions. Second, it integrates both computational and visualization methods in a unified and flexible framework to create a human-led, computer-assisted, efficient and effective geographic knowledge discovery environment. Specifically, the developed knowledge discovery environment consists of four major groups of methods. (1) An efficient hierarchical spatial clustering method, which can identify arbitrary-shaped hierarchical 2D clusters at different scales, and generate a 1D ordering of the spatial points that preserves the entire hierarchical cluster structure; (2) An efficient and effective feature selection method, which can identify interesting subsets of attributes from the original data space; (3) An efficient hierarchical, multivariate clustering method, which can identify arbitrary-shaped hierarchical multivariate clusters given a set of attributes; (4) Various visualization techniques associated with each above method to support an interactive and iterative discovery process.; The developed methods are implemented within a component-oriented framework, which is: (1) flexible to customize and evolve over time, (2) collaborative in integrating various components to work together and address complex problems, and (3) robust to use and maintain. Three applications of the developed geographic knowledge discovery environment are presented to demonstrate how the developed methods and integrated discovery environment work and how well they work.
Keywords/Search Tags:Geographic, High-dimensional, Clustering, Large, Developed, Methods, Data
Related items