Font Size: a A A

Research On Dynamic Indexing Technologies In Full-text Retrieval System

Posted on:2010-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:L QuFull Text:PDF
GTID:2178360332457852Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Inverted index is the key technology to improve the efficiency of full-text search, but there is the space efficiency, dynamic performance, and retrieval efficiency, which restrict the inverted index building. This article is researched on the dynamic full-text index of the merger, by the amount of updates, compressed storage and retrieval queries launched to improve its overall performance.In this paper, the goal is to build vast amounts of information compatible with the Internet, the dynamics of real-time index file, high speed to achieve an index search query based on the inverted index file structure and index merge algorithm. The difficulties of the dynamic full-text indexing technology implement, firstly it is necessary to improve the compression ratio, compression inverted index help to improve the query throughput; the second is to balance the dynamic nature, which need to be studied not only improve the compression ratio but also dynamically updated to facilitate the compression index method.In this paper, it is found that the document ID and the word location information can be encoded using the d-gap after the adoption of variable-length compression method to compress on the inverted list based on the analysis of dynamic characteristics. And word frequency can be directly variable-length compression to increase the compression ratio. So the hybrid coding is superior to other methods to support dynamically updated in the compression efficiency. For an in-depth research on the inverted index file structure, an efficient index file structure, allowing multiple sub-indexes existing, and the index merged at a particular time is optimized to achieve efficient incremental to build the index. Experiments show that using this combined strategy for the reconstruction of Huffman tree can get better time and space overhead. In index searching, there is a variety of retrieval models and query methods.Based on the above research results, this final design and implement a dynamic and efficient full-text index of experimental prototype system, including the commonly used text data analysis, index build merge the index query application modules for the conduct of experiments and research related to the algorithm provides a the basic platform.
Keywords/Search Tags:Information Retrieval, Inverted File, Index Maintenance, Online Index, Dynamic Text Collections
PDF Full Text Request
Related items