Font Size: a A A

A comparison of root and stemming techniques for the retrieval of Arabic documents

Posted on:2003-04-27Degree:Ph.DType:Thesis
University:McGill University (Canada)Candidate:Moukdad, Haidar AFull Text:PDF
GTID:2465390011982518Subject:Information Science
Abstract/Summary:
Using information retrieval systems to gain access to documents in languages other than English is becoming an increasingly significant problem. Rules, theories, algorithms, and retrieval methods designed and developed for English and other morphologically similar languages may or may not apply in the linguistic environments of other languages. The problem is particularly acute in languages that differ radically from English on account of morphological rules. This thesis compares the effects of two indexing and retrieval techniques (stemming and root retrieval) on information retrieval in Arabic through an exploratory study of the handling of Arabic words by an English search engine. It also investigates how best to adapt existing English-language information retrieval systems for use with Arabic-language texts, and specifically to process words and their morphological variations. Search experiments, using 2000 Arabic documents and 40 Arabic search terms (nouns), were conducted with a Web search engine developed for English, AltaVista, to compare the performances of stemming and root retrieval and to investigate the possibility of adapting this engine for use with Arabic text. The results of the experiments show that more effective retrieval can be accomplished through stemming, and that it is possible to adapt the engine for use with Arabic without the need to develop root-retrieval features.
Keywords/Search Tags:Retrieval, Arabic, Root, Stemming, English, Languages, Engine
Related items