Font Size: a A A

The Full-text Indexing Technology Index Merge Algorithm Research And Analysis

Posted on:2009-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y CengFull Text:PDF
GTID:2208360245961257Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent year, the maintenance and update of on-line index become the hot issue in the research of full-text retrieval ,as the rapid development of Internet, data on the Internet is also exploding rapidly,however, no matter the huge data existed, new date continue to grow, outdated date need to be eliminated as well, which requires the frequent insertion and deletion of data, therefore, accordingly, index merger arithmetic is put on the central arena. In this paper, we mainly aim at the analysis of index merge algorithm in the full-text retrieval and provide improved algorithm in terms of time consumption.The fundamental of index is convert original date to flexible data structure to make search more effective, base on analysis of the basic structure which does impact efficiency of index update, include how to construct index, structures of index and inverted file together with how to construct inverted index, we also mention the advantages and disadvantages of static index, and introduce increment index and the importance of dynamic update at the end of this session. 2.In the thesis, we research 3 index maintenance strategies, In-place Index update, Merge-Based Index update and Re-Build Index update, which focus on cost consumption in the different strategies, accordingly, we analyze the merge-based index maintenance strategies, included immediate merge algorithm, logarithmic merge, geometric partitioning and dynamic balancing tree merger algorithm together with their advantages and disadvantages and we present improved solutions. Geometric partitioning is our focus which have been studied in detail in this paper, we present new geometric merge together with garbage collection to reduce time consumption, the limits method is introduced to delete outdated documents in new algorithm.At the end of this paper, we make use of an open source full-text retrieval system to test immediate merge, logarithmic merge, geometric partitioning merge and validate the improved algorithm, the time consumption of new algorithm are lower than old approach from experiment data, which means new algorithm is more effective.
Keywords/Search Tags:full-text retrieval, inverted index, index merge, re-merge
PDF Full Text Request
Related items