Research On On-Line Indexing For Full-Text Retrieval

Posted on:2011-09-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2178330338989605

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the expansion of Internet information, more and more information can be achieved, but it is difficult to gain the newest information people need precisely and timely. Indexing construction and maintenance are important sub-components of Search Engine that attempts to build indices for large Web information and update indices real-time. In that case, it can be queried by users for seasonable, precise and comprehensive information. How to construct indices and manage them in on-line enviroment and how to banace the performance between indexing and searching are the main concentration of this paper.In this paper, we launch the topic from the inverted indexing technology which is the main technology in full-text Information Retrieval. Also, we introduce some key technology on indexing and managing indices. Based on the research, our main contribution is in the following fields.1. First, we make a thorough research on index construction based on inverted file and some indexing algorithms. With the requirement and context of on-line index, we design and implement a kind of inerted indexing structure, which surport construting and updating indices efficietly.2. We propose an index managing algorithm called GPDID (Geometric Partition for Deleting Indexed Documents) through the research on the feature of index updating on-line. Compared to the traditional index constructing and updating algorithms, the threadhold value is imported for recoving garbage collection. Through sufficient experiments, it can be proved that our methed improves indexing performance based on documents deleting, and at the same time,it keeps the high searching performance.3. We propose an efficient index construction and management by using dynamic Huffman-like tree. It can dynamically adjust the sequence of sub-index merge operations during index construction using a k-way methods, and offers better query processing performance than previous methods. Through sufficient experiments, we prove that the algorithm performs well in constructing index, query processing, and providing an equivalent level of index maintenance performance when document insertions and deletions exist in parallel.Based on the above researches, in this paper we design and implement an experimental prototype system of full-text retrieval system. This system includes module of document parsing, indexing module, searching sub-system, and indices storage, which can be used as a basic platform for relevant researches and experiments of information extraction.

Keywords/Search Tags:

information retrieval, on-line index, garbage collection, index performance, query performance

PDF Full Text Request

Related items

1	Research On On-line Indexing For Full-text Retrieval
2	Based On Spatial Data Storage And Retrieval Of Geospatial Research
3	Thor: A universal XML index for efficient XPath query processing
4	Research On The Key Techniques For XML Index And Query
5	The Study Of Text Index Construction For Large-Scale Dynamic Collection
6	Research And Implementation Of An Open High-Performance Platform Of Full-Text Retrieval
7	Olap Query Performance Study
8	A Study On Compression Algorithm Performance Based Inverted Index
9	A Research On XML Querying Based On Index
10	Performance Tuning And Optimization Of DB2 Database Index