Font Size: a A A

Information retrieval and mining in high dimensional databases

Posted on:2001-05-22Degree:Ph.DType:Dissertation
University:New Jersey Institute of TechnologyCandidate:Wang, XiongFull Text:PDF
GTID:1468390014958433Subject:Computer Science
Abstract/Summary:
This dissertation is composed of two parts. In the first part, we present a framework for finding information (more precisely, active patterns) in three dimensional (3D) graphs. Each node in a graph is an undecomposable or atomic unit and has a label. Edges are links between the atomic units. Patterns are rigid substructures that may occur in a graph after allowing for an arbitrary number of whole-structure rotations and translations as well as a small number (specified by the user) of edit operations in the patterns or in the graph. The edit operations include relabeling a node, deleting a node and inserting a node. The proposed method is based on the geometric hashing technique, which hashes node-triplets of the graphs into a 3D table and compresses the label-triplets in the table. To demonstrate the utility of our algorithms, we discuss two applications of them in scientific data mining. Experimental results indicate the good performance of our algorithms and high recall and precision rates for both classification and clustering. We also extend our algorithms for processing a class of similarity queries in databases of 3D graphs.; In the second part of the dissertation, we present an index structure, called MetricMap, that takes a set of objects and a distance metric and then maps those objects to a k-dimensional pseudo-Euclidean space in such a way that the distances among objects are approximately preserved. Our approach employs sampling and the calculation of eigenvalues, and eigenvectors. The index structure is a useful tool for clustering and visualization in data intensive applications, because it replaces expensive distance calculations by sum-of-square calculations. This can make clustering in large databases with expensive distance metrics practical.; We compare the index structure with another data mining index structure, FastMap, proposed by Faloutsos and Lin, according to two criteria: relative error and clustering accuracy. The main qualitative conclusion is that these two index structures capture complementary information about distance metrics and therefore can be used together to great benefit. The net effect is that multi-day computations can be done in minutes.
Keywords/Search Tags:Information, Data, Mining, Index structure
Related items