Font Size: a A A

Research On Compressed Full-Text Indexes

Posted on:2015-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:X H LiuFull Text:PDF
GTID:2268330425989112Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Full-text indexes are used to quickly query information in the mass text collection. The present full-text index methods are usually summed up in two categorys:one is based on inverted index, the other is based on suffix array. With the explosive growth of documents, the present methods show poor performance in space consumption, query speed, and flexibility. Therefore, it’s necessary to explore better full-text index.The goal of the research on compressed full-text indexes is to use text compression technology and full-text indexing technology to find an effective way to index text with smaller space, support flexible queries, and even restore the original text which means the index will completely replace the original text.Inverted index and compressed suffix array index are implemented and a two-level compressed self-index method is proposed. The method consists of two layers, the index layer and the presentation layer. The presentation layer is the second layer of the index, it builds word-codes with the input of the original text and then encodes the original text into two sequences (a word stem sequence and a word form sequence). The index layer is the first layer of the index, it receives the output from the presentation layer. In this layer, the word stem sequence is used to create compressed self-index which supports query function and supports extraction function with the word stem sequence.A hierarchical rearrange encoding method is proposed to compress the word form sequence. The encoding method is based on variable-length encoding compression algorithm, which uses smaller space and supports random access to the original sequence.The experimental results demonstrate that the proposed two-level compressed self-index method can index text effectively, and the performmence of the method is significantly better than the inverted index and compressed suffix array.
Keywords/Search Tags:Compressed Full-Text Indexes, Inverted Index, Compressed SuffixArray, Two-Level Compressed Self-Index
PDF Full Text Request
Related items