Index Technology For Large Sparse Relational Datasets

Posted on:2009-12-14

Degree:Master

Type:Thesis

Country:China

Candidate:B D Li

Full Text:PDF

GTID:2178360278464423

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In community web management systems (CWMS), storage structures inspired by large sparse tables (LST) are being used increasingly to manage sparse datasets. A LST typically embodies thousands of attributes, with many of them being undefined, and low-dimensional structured similarity search on a combination of numerical and text attributes is a common operation. However, many properties of such wide tables and their associated Web 2.0 services render most multi-dimensional indexing structures irrelevant. Recent studies in this area have mainly focused on improving the storage efficiency and efficient deployment of inverted indices; so far no new index has been proposed for indexing LST. The inverted index is fast for scanning but not efficient in reducing random accesses to the data file as it captures little information about the content of attribute values.In this paper, we propose the filter-and-refine search based iVA-file that works on the basis of approximate contents and keeps scanning efficiency within a bounded range. We introduce the nG-Signature to approximately represent data strings and improve the existing approximate vectors for numerical values. We also propose an efficient query processing strategy for the iVA-file, which is different from strategies used for existing scan-based indices. Extensive experiments on real datasets show that the iVA-file outperforms existing proposals in query efficiency significantly, while keeps an ideal update speed.

Keywords/Search Tags:

indexing, large sparse table, structured query

PDF Full Text Request

Related items

1	Research On Large-scale Structured And Semi-structured Biodata Query Method
2	Techologies On Collecting And Managing Table-structured Data
3	Image Indexing By Structured Sparse Spectral Hashing
4	Structured Indexing And Digital Copyright Protection Technology Research And Application
5	Research On Structured Sparse Decomposition Algorithm For Communication Signals
6	Reference directed indexing: Indexing scientific literature in the context of its use
7	Edge Detection Of Sparse-Structured Objects
8	Research On Large-Scale Graph Subgraph Query Method Based On Feature Index
9	Design And Implementation Of Mapreduce-based Structured Query Mechanism
10	The Study Of Indexing Of XML-based Document