Font Size: a A A

The Effects of Index Storage on Ranked Information Retrieval

Posted on:2013-10-01Degree:M.S.E.EType:Thesis
University:West Virginia UniversityCandidate:Mantheiy, James E., JrFull Text:PDF
GTID:2458390008477846Subject:Computer Science
Abstract/Summary:
Information retrieval is the process of recalling and ordering all relevant documents based on a user's search query. Examples of information retrieval systems are Google, Bing, and Yahoo search. In order to perform an effective search, these systems utilize an inverted index for mapping content, such as words, to the original document. It is widely believed there are two options for implementing an inverted index and these options are in memory or as a file. This investigation looks at implementing an inverted index as a table in a database as compared to the other two options. In addition, this investigation will look at the optimal combination of inverted index implementation to retrieval algorithms such as TD-IDF, Best Match 25, and a unigram model with Jelinek-Mercer smoothing. This is determined by designing and developing a system which will index and search three different collections of various data, size, and complexities. By doing this, it is found that utilizing an inverted index implemented in a database is a viable option for information retrieval. It is also noteworthy that Best Match 25 or a unigram language model consistently outperforms TD-IDF. In conclusion, if the collection cannot be indexed in memory, then utilizing a database implemented index is a sufficient second option.
Keywords/Search Tags:Index, Retrieval, Information, Search
Related items