Font Size: a A A

Comparing subsets from digital spatial archives: Point set similarity

Posted on:1998-01-22Degree:Ph.DType:Dissertation
University:University of MaineCandidate:Flewelling, Douglas MarkFull Text:PDF
GTID:1460390014477353Subject:Computer Science
Abstract/Summary:
This research focuses on the key question of measuring the spatial similarity between a small subset and much larger superset. The new contribution of this work is a formal approach to measuring dataset similarity through a combination of the component similarities of the spatial qualities--density, dispersion and pattern--and the relative importance of spatial objects in the datasets.; Data consumers are faced with the difficult task of fulfilling their data requirements from the growing number of digital spatial data collections available online. New databases system architectures, such as digital libraries and data warehouses, are being introduced to manage these large collections. However, there are cognitive and technological limits which make very large datasets difficult to use. Sampling is one traditional approach to simplifying datasets, the assumption being that the sample (or subset) is representative of the entire dataset. The spatial data consumer may be interested in a specific set of spatial qualities in the dataset and traditional random samples do not preserve these qualities in very small subsets.; This work defines a formal approach to measuring similarity between very large spatial datasets and their much smaller subsets. The model defines methods for building similarity measures over nominal, ordinal, interval, and ratio measurement scales. Similarity measurements from different scales are combined through a set of measures that express similarity as a distance from zero (equal) to one (completely different). Values generated from this similarity measure are sorted into a similarity index.; The spatial measures were tested against a database of places and nineteen synthetically generated subsets. Examination of the metadata generated for each subset indicated that by combining multiple measures of a single spatial quality it is possible to isolate and identify the method that was used to generate a dataset.; The model was examined with regard to its application to digital spatial libraries and data warehouses. Since digital libraries are open domains the type of measure results that were usable for similarity assessment were restricted to interval and ratio measurement scales. Data warehouses have greater potential for domain closure and can use all scales of measurement in similarity assessment.
Keywords/Search Tags:Similarity, Spatial, Subset, Data warehouses, Scales
Related items