Font Size: a A A

The geographical analog engine: Hybrid numeric and semantic similarity measures for U.S. cities

Posted on:2009-10-22Degree:Ph.DType:Dissertation
University:The Pennsylvania State UniversityCandidate:Banchuen, TawanFull Text:PDF
GTID:1448390002491525Subject:Geography
Abstract/Summary:
This dissertation began with the goal to develop a methodology for locating climate change analogs, and quickly turned into a quest for computational means of locating geographical analogs in general. Previous work in geographical analogs either only computed on numeric information, or manually considered qualitative information. Current and emerging technologies, such as electronic document collections, the Internet, and the Semantic Web, make it possible for people and organizations to store millions of books and articles, share them with the world, or even author some themselves. The amount of electronic and online content is expanding at an exponential speed, such that analysts are increasingly overwhelmed by the sheer volumes of accessible information. The dissertation explores techniques from knowledge engineering, artificial intelligence, information sciences, linguistics and cognitive science, and proposes a novel, automatic methodology that computes similarity within online/offline textual information, and graphically and statistically combines the results with those of numeric methods.;U.S. cities with populations larger than 25,000 people are selected as a test case. Places are evaluated based on their numeric characteristics in the County and City Data Book and qualitative characteristics from Wikipedia entries. The dissertation recommends a way to convert Wikipedia entries into the Web Ontology Language (OWL) ontologies, which computer algorithms can read, understand and compute. The dissertation initially experiments with Mitra and Wiederhold's semantic measure to quantify similarity between places in the qualitative space. Many shortfalls are identified, and a series of experimental enhancements are explored. The experiments demonstrate that good semantic measures should employ a comprehensive stop-words list and a complete, but succinct vocabulary. A semantic measure that can recognize synonyms must understand the intended senses of words in a place description. Furthermore, analysts need to be careful with two styles of descriptions: descriptions of places that are (1) created by following a template, or (2) laden with statistical statements can result in falsely high similarity between the places.;It is illustrated that scatter plots of numeric similarity scores versus semantic similarity scores can effectively help analysts consider similarity between places in two-space. Analysts can visually observe whether the numeric ranks of places agree with the semantic ranks. The dissertation also shows that the Spearman's rank correlation test and the Kruskal-Wallis test of means can provide statistical confirmation for visual observations. The proposed hybrid methodology enables analysts to automatically discover geographical analogs in ways that strictly numeric methods or manual semantic analysis cannot offer.
Keywords/Search Tags:Semantic, Numeric, Geographical, Similarity, Analogs, Methodology, Dissertation, Analysts
Related items