Font Size: a A A

Indexing and mining multimedia databases

Posted on:1999-10-16Degree:Ph.DType:Thesis
University:University of Maryland, College ParkCandidate:Korn, Philip Russell (Flip)Full Text:PDF
GTID:2468390014470988Subject:Computer Science
Abstract/Summary:
In this thesis we examine methods for two related problems: (a) indexing multimedia databases for fast searching, and (b) mining data from a multimedia database to extract patterns, correlations, rules and outliers.; We first investigate the problem of retrieving similar shapes from a large database; in particular, we focus on medical tumor shapes ('Find tumors that are similar to a given pattern.'). We use a natural similarity function for shape matching based on concepts from mathematical morphology and show how it can be lower-bounded by a set of shape features for safely pruning candidates, thus giving fast and correct output. These features can be organized in a spatial access method, leading to fast indexing for range queries and nearest neighbor queries. In addition to the lower-bounding, our second contribution is the design of a fast algorithm for nearest neighbor search, achieving significant speedup while provably guaranteeing correctness. Our experiments demonstrate that roughly 90% of the candidates can be pruned using these techniques, resulting in up to 27 times better performance compared to sequential scan.; The second problem is data mining. We propose methods for efficiently finding 'interesting things' from large data sets in the form of summaries, outliers, and rules, based on powerful tools from statistics. Our first goal was to compress a very large data set in a manner that efficiently supports ad hoc queries, as is often desired for data mining. Compressed data is notoriously difficult to access randomly. We show how this can be done in three passes, provided that a small amount of lossiness can be tolerated. The proposed method, called SVDD, achieves an astounding 0.5% reconstruction error with a space requirement under 2% (i.e., a 50:1 compression ratio).; A good data compression method naturally leads to data mining. All compression methods operate by detecting patterns, or rules, which are exploited to reduce redundancy. In the last part of this thesis, we consider the problem of discovering, interpreting, and using such rules, given data organized in a matrix of, e.g., customers x products. We propose a new paradigm, namely, Ratio Rules, which are quantifiable in that one can measure the ability of the rules to recover values that are pretended to be unknown; this is in contrast to the well known association rules. Another contribution is a novel method to guess missing values from the rules. (Abstract shortened by UMI.)...
Keywords/Search Tags:Data, Mining, Indexing, Multimedia, Rules, Method, Fast
Related items