Font Size: a A A

Research On Indexing Heterogenous Data Method Based On Dataspaces

Posted on:2014-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:H W WangFull Text:PDF
GTID:2268330425966227Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Currently, personal and organizational information presents rapid growth trend, and theproportion of unstructured data continues to increase, these massive, distributed,heterogeneous and coexistent data belonging to one subject constitutes a dataspace, how toprovide users with efficient, convenient and diverse search query service is the greatchallenge of dataspace. However, building efficient indexing method for heterogeneous datain the dataspace is the basis to solve the problem. Therefore, the research of dataspaceheterogeneous data indexing method is of great significance.Data management research community has done a lot of research on the index method.In the past, the study on the indexing method is usually based on a single data format and onequery method, for example, unstructured data format and keyword queries in the searchengine, and relational database relational tables and SQL queries. However, the data in thedataspace has the multiple data sources and heterogeneous characteristics, it may contain avariety of data formats such as structured, semi-structured and no-structured,in addition, thedataspace with Pay-as-you-go feature need to provide diverse search query services for userfrom the keyword query to structured query, for example, initially due to extractioninformation is weak and not established strongly semantic association between the data source.Therefore, it can only provide the user with basic keyword search service. As time goes on,the system will gradually build the semantics associated information between data items,meanwhile the system will also be able to support richer queries. Therefore, unlike traditionalindexing methods, us indexing methods need to be able to index multiple formats data in thedataspace, at the same time support keyword queries, Structured query and other variousqueries.Through analysis of existing data models and query methods, this article use iMeMex asthe dataspace data model, and define three query ways: keyword query, predicate query andpath definition. Then we propose a new index method for improving search and queryefficiency, called the EIBH mixed index. The new indexing method is consisted of extendedinverted index and two auxiliary index, by extending the inverted list keywords column andlist node information index resource view to support the three query ways and improve query processing efficiency, using two auxiliary index solve index connection inefficiencies.Experimental results show that: the index method is an effective and feasible solution toheterogeneous data indexing and query efficiency in dataspace.
Keywords/Search Tags:Dataspace, Indexing, Mixed Index, Extended Inverted Lists
PDF Full Text Request
Related items