Font Size: a A A

A Research Of Full-Text Retrieval Based On Inverted Index

Posted on:2005-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiuFull Text:PDF
GTID:2168360152469203Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Inverted index is an important technique to improve full-text retrieval performance. The storage utilization, dynamic efficiency, creation efficiency, and retrieval efficiency are four critical problems of inverted index. This research discusses four aspects of inverted index including compression, incremental update, creation efficiency, and retrieval efficiency. Our intention is to take the four aspects into account and enhance their integrated performence.Compression of inverted files will result in an increase in query throughput, because the time to load and decompress a compressed inverted list is shorter than the time to load a never compressed inverted list. Previous studys almost focused on the compression ratio, and usually ignore the dynamic efficiency of inverted file. In order to combine the compression with dynamic efficiency, it is necessary to develop a compression method which supports dynamic update. This thesis analyses the dynamic character of inverted index, then draws a conclusion that the three sublists constructed inverted list have different dynamic characters. Based on this conclusion, we put forward a hybrid compression method.In order to support dynamic update, we mend the data structure of inverted index, and take a new update strategy of inverted index based on extendible hashing to make inverted index extendible. It not only reduces the move overhead and has little influence on inverted index retrieval. It supports document insertion and deletion, and provides high retrieval efficiency and space utilization. On the basis of this strategy, incremental update and real-time update are realized.The search time of terms library is one of the factors that determine creation efficiency and retrieval efficiency of inverted index. We use order preserving minimal perfect hash function to realize the search of terms library. It can not only upgrade the search speed, but also avoid sorting. It can be seen from the experiment that this method can obtain twice speed of binary search.
Keywords/Search Tags:information retrieval, full-text retrieval, full-text index, inverted index, inverted index compression, incremental update
PDF Full Text Request
Related items