Font Size: a A A

Effectiveness of document processing techniques for Arabic information retrieval

Posted on:2004-10-27Degree:Ph.DType:Dissertation
University:University of PittsburghCandidate:Abu El-Khair, Ibrahim HassanFull Text:PDF
GTID:1468390011476326Subject:Library science
Abstract/Summary:
Effectiveness of alternate text processing techniques for Arabic retrieval are investigated in this study. The techniques that were studied are term weighting schemes, stemming, and stop words elimination. This research explored the effect of different weighting schemes on the retrieval effectiveness in Arabic Information Retrieval. The weighting schemes that were examined are the inverse document frequency weight, probabilistic weighting, and statistical language modeling. With these weighting schemes three stemming algorithms for Arabic text were used and three stoplists were created in order to combine the statistical approaches with linguistic approaches to reach an optimal performance. The data set that was used in the experiment is the LDC (Linguistic Data Consortium) Arabic Newswire data set.; Results indicated that the Best Match weighting scheme used in the Okapi retrieval system had the best overall performance out of the three weighting algorithms used in the study. The Light-10 stemmer in Lemur Toolkit was the best performing stemmer used and improved the retrieval results significantly. Stoplists slightly improved the retrieval effectiveness especially when used with the BM25 weight. The overall performance of a genera/stoplist was better than the other two lists.
Keywords/Search Tags:Retrieval, Arabic, Effectiveness, Techniques, Used, Weighting schemes
Related items