Font Size: a A A

A Partial Match-Based Indexing Approach For Heterogeneous Data

Posted on:2015-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiangFull Text:PDF
GTID:2348330518970456Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Since the 20th century, the information data has been exponentially increasing, which make people face enormous challenges in finding information from heterogeneous data efficiently and rapidly. The index is indispensable for accessing information. Current studies on structured data index and unstructured data index have made significant achievements.However, the single index fails to meet the requirements of users, so it is necessary for us to research indexing which can satisfy these two data types.Now, there are two different kinds of data, one is structured data, stored in large commercial relational databases, the other is unstructured data, represented by text documents,HTML Web pages, Email and so on. The main way to search structured data is SQL-style query, while the keyword search is used to unstructured data. With the rapid rise of data information, more attention has been paid to the index which can retrieve a variety of data types. Heterogeneous data index is different from the single index, which can support for retrieving various types of data.In this paper, all the existing work about heterogeneous data model and query language are analyzed and summarized in detail. Integrating each model, we propose a keyword-based data model, which can well represent structured data and unstructured data. In addition, we also come up with a partial match-based indexing approach for heterogeneous data. The main idea of the approach is to pre-compute certain queries and store their results. Partial matching is considered in building and querying index on the whole. When building this index, we take advantage of the strategy of pruning and sorting based on keyword count, which reduce as the time of construction as possible; When querying the index, it depends on keywords count and adopts a stratified index method, which greatly lessen the users' retrieval time. The experimental results show that the index method can solve the problem of heterogeneous data index excellently and has a good performance.
Keywords/Search Tags:Heterogeneous Data, Partial Match, Data Model, Stratified Index
PDF Full Text Request
Related items