Research On Indexing Heterogenous Data Method Based On Dataspaces

Posted on:2014-10-28

Degree:Master

Type:Thesis

Country:China

Candidate:H W Wang

Full Text:PDF

GTID:2268330425966227

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Currently, personal and organizational information presents rapid growth trend, and theproportion of unstructured data continues to increase, these massive, distributed,heterogeneous and coexistent data belonging to one subject constitutes a dataspace, how toprovide users with efficient, convenient and diverse search query service is the greatchallenge of dataspace. However, building efficient indexing method for heterogeneous datain the dataspace is the basis to solve the problem. Therefore, the research of dataspaceheterogeneous data indexing method is of great significance.Data management research community has done a lot of research on the index method.In the past, the study on the indexing method is usually based on a single data format and onequery method, for example, unstructured data format and keyword queries in the searchengine, and relational database relational tables and SQL queries. However, the data in thedataspace has the multiple data sources and heterogeneous characteristics, it may contain avariety of data formats such as structured, semi-structured and no-structured,in addition, thedataspace with Pay-as-you-go feature need to provide diverse search query services for userfrom the keyword query to structured query, for example, initially due to extractioninformation is weak and not established strongly semantic association between the data source.Therefore, it can only provide the user with basic keyword search service. As time goes on,the system will gradually build the semantics associated information between data items,meanwhile the system will also be able to support richer queries. Therefore, unlike traditionalindexing methods, us indexing methods need to be able to index multiple formats data in thedataspace, at the same time support keyword queries, Structured query and other variousqueries.Through analysis of existing data models and query methods, this article use iMeMex asthe dataspace data model, and define three query ways: keyword query, predicate query andpath definition. Then we propose a new index method for improving search and queryefficiency, called the EIBH mixed index. The new indexing method is consisted of extendedinverted index and two auxiliary index, by extending the inverted list keywords column andlist node information index resource view to support the three query ways and improve query processing efficiency, using two auxiliary index solve index connection inefficiencies.Experimental results show that: the index method is an effective and feasible solution toheterogeneous data indexing and query efficiency in dataspace.

Keywords/Search Tags:

Dataspace, Indexing, Mixed Index, Extended Inverted Lists

PDF Full Text Request

Related items

1	Research On Inverted Index Compression Method For Dataspace Based On Interpolative Code
2	Study And Implementation Of Image Indexing Techniques In Dataspace System
3	Research And Implementation Of Partial Indexing Mechanism In Personal Dataspace Management System
4	Research And Implementation Of Query And Indexing Mechanism In Personal Dataspace Management System
5	Research On Strategies Of Indexing In Dataspace
6	Research And Implementation Of Inverted Index For Large-scale Visual Search
7	Research On Key Technologies Of Integration And Query In Dataspace
8	Research On Unstructured Information Management Of Digital TV
9	The Research About Multidimensional Data Indexing Architecture
10	Research Of Index In Chinese Full-text Retrieval System