Font Size: a A A

Discovering GIS sources on the Web using summaries

Posted on:2009-02-09Degree:Ph.DType:Thesis
University:University of California, IrvineCandidate:Hariharan, RamaswamyFull Text:PDF
GTID:2448390005951510Subject:Computer Science
Abstract/Summary:
The past decade has witnessed significant growth in the number of web sites that contain geographical information. Specifically, the number of downloadable and searchable GIS databases available over the internet has grown exponentially. Proliferation of millions of high quality GIS data sources has led to an important challenge of source discovery---that of, "discovering most relevant GIS data sources given a particular query or task".;In this thesis, we first consider the problem of discovering GIS data sources on the web that are downloadable. Source discovery queries for GIS data are specified using keywords and a region of interest. A source is considered relevant if it contains data that matches the keywords in the specified region. Existing techniques simply rely on textual metadata accompanying such datasets to compute relevance to user-queries. Such approaches result in poor search results, often missing the most relevant sources on the web. Hence, instead of relying on the metadata, retrieval systems should focus on GIS contents that are rich in both textual and spatial descriptions of the data. However, the challenge becomes that of managing thousands of databases that are downloadable with millions of records and enable search over such large repositories. We address this problem by first developing meaningful summaries of GIS database that approximate the contents of each data source. We then develop novel indexing structure to index each of the summarized data. We conduct experiments showing the effectiveness of proposed techniques by significantly improving the quality of query results over baseline approaches, while guaranteeing scalability and high performance. We also propose a GIS data integration solution that combines the data objects from different data sources to provide one integrated results to the user. Such a data integration task is essential without which the user is overwhelmed with redundant or missing results.;Secondly, there are hundreds of online GIS data sources on the Web that cannot be downloaded. Such data sources are accessed only through query interfaces. To enable search over such online data sources different summarization; indexing and data integration techniques are required. In this thesis, we also suggest adaptive sampling based technique to summarize online data sources.
Keywords/Search Tags:GIS, Sources, Data, Web, Over
Related items