Information retrieval and mining in high dimensional databases

Posted on:2001-05-22

Degree:Ph.D

Type:Dissertation

University:New Jersey Institute of Technology

Candidate:Wang, Xiong

Full Text:PDF

GTID:1468390014958433

Subject:Computer Science

Abstract/Summary:

This dissertation is composed of two parts. In the first part, we present a framework for finding information (more precisely, active patterns) in three dimensional (3D) graphs. Each node in a graph is an undecomposable or atomic unit and has a label. Edges are links between the atomic units. Patterns are rigid substructures that may occur in a graph after allowing for an arbitrary number of whole-structure rotations and translations as well as a small number (specified by the user) of edit operations in the patterns or in the graph. The edit operations include relabeling a node, deleting a node and inserting a node. The proposed method is based on the geometric hashing technique, which hashes node-triplets of the graphs into a 3D table and compresses the label-triplets in the table. To demonstrate the utility of our algorithms, we discuss two applications of them in scientific data mining. Experimental results indicate the good performance of our algorithms and high recall and precision rates for both classification and clustering. We also extend our algorithms for processing a class of similarity queries in databases of 3D graphs.; In the second part of the dissertation, we present an index structure, called MetricMap, that takes a set of objects and a distance metric and then maps those objects to a k-dimensional pseudo-Euclidean space in such a way that the distances among objects are approximately preserved. Our approach employs sampling and the calculation of eigenvalues, and eigenvectors. The index structure is a useful tool for clustering and visualization in data intensive applications, because it replaces expensive distance calculations by sum-of-square calculations. This can make clustering in large databases with expensive distance metrics practical.; We compare the index structure with another data mining index structure, FastMap, proposed by Faloutsos and Lin, according to two criteria: relative error and clustering accuracy. The main qualitative conclusion is that these two index structures capture complementary information about distance metrics and therefore can be used together to great benefit. The net effect is that multi-day computations can be done in minutes.

Keywords/Search Tags:

Information, Data, Mining, Index structure

Related items

1	Research On Method Of Video Structure Mining Based On Content
2	Mining Algorithm Based On Frequent Sub-graph Of The Multi-layer Index Structure
3	Text Data Mining For Applied Research In Information Monitoring
4	Research And Application Of A Query Index For SDM In GIS
5	Data Mining Application In The Evaluation And Selection Of Suppliers
6	Research And Improvement Of Web Structure Mining Algorithm
7	High-dimensional Data Indexing Structure
8	Data Mining In Patients Fee Structure
9	Research On Data Retrieval Technology Based On Hybrid Index Structure In The DRC Of DOA
10	Research On Query Optimization And Mining Algorithm For Big Trajectory Data