The Effects of Index Storage on Ranked Information Retrieval

Posted on:2013-10-01

Degree:M.S.E.E

Type:Thesis

University:West Virginia University

Candidate:Mantheiy, James E., Jr

Full Text:PDF

GTID:2458390008477846

Subject:Computer Science

Abstract/Summary:

Information retrieval is the process of recalling and ordering all relevant documents based on a user's search query. Examples of information retrieval systems are Google, Bing, and Yahoo search. In order to perform an effective search, these systems utilize an inverted index for mapping content, such as words, to the original document. It is widely believed there are two options for implementing an inverted index and these options are in memory or as a file. This investigation looks at implementing an inverted index as a table in a database as compared to the other two options. In addition, this investigation will look at the optimal combination of inverted index implementation to retrieval algorithms such as TD-IDF, Best Match 25, and a unigram model with Jelinek-Mercer smoothing. This is determined by designing and developing a system which will index and search three different collections of various data, size, and complexities. By doing this, it is found that utilizing an inverted index implemented in a database is a viable option for information retrieval. It is also noteworthy that Best Match 25 or a unigram language model consistently outperforms TD-IDF. In conclusion, if the collection cannot be indexed in memory, then utilizing a database implemented index is a sufficient second option.

Keywords/Search Tags:

Index, Retrieval, Information, Search

Related items

1	Encrypted Search: Enabling Standard Information Retrieval Techniques for Several New Secure Index Types While Preserving Confidentiality Against an Adversary With Access to Query Histories and Secure Index Contents
2	Research On Index Technology In XML Search Engine
3	The Design And Implementation Of An Information Retrieval System
4	Design And Implementation Of News Search Engine Based On MySQL
5	Research On Fast Text Retrieval Methods And Optimization Of Engineering Realization
6	Research On Adaptive Index Method For Real-time Search On Microblogs
7	Efficient TopK Processing In Web Search Systems
8	The Research And Design Of Online Multilingual Information Retrieval And Management System
9	Research Of An Information Retrieval Algorithm Based On The Relevance Of Mobile Search Users
10	Research And Implementation Of Spatial Text Similarity Search