Font Size: a A A

Design and implementation of automatic word and phrase indexing for information retrieval with Arabic documents

Posted on:1996-01-22Degree:Ph.DType:Dissertation
University:Illinois Institute of TechnologyCandidate:Hmeidi, Ismael IbrahimFull Text:PDF
GTID:1468390014487228Subject:Computer Science
Abstract/Summary:
Investigation of methods of automatic information retrieval for Arabic is essential to the growth of learning in the Arab world. It is the simplest and most cost-effective way to make the resources of large reference libraries available to the increasing numbers of students and researchers in the Arab word.; We have put together a corpus of 242 abstracts of Arabic documents using the proceedings of the Saudi Arabian National Conferences as a source. All these abstracts involve computer science and information systems. We also designed and built an automatic retrieval system from scratch to handle Arabic data. The system is designed to support the following goals. First, to test an automatic word indexing system based on the three indexing methods, full words, stems, and roots. Second, to test an automatic phrase indexing process using the three indexing method, full words, stems, and roots. The system was implemented in the C language using the GCC compiler and runs on IBM/PCS and compatible microcomputers.; We have implemented both automatic and manual indexing techniques for this corpus with and without phrases. A long series of experiments using measures of recall and precision has demonstrated that automatic indexing is at least as effective as manual indexing and more effective in some cases. Since automatic indexing is both cheaper and faster, our results suggest that we can achieve a wider coverage of the literature with less money and produce as good results as with manual indexing.; We have also compared the results using words, stems, and roots as index terms and confirmed the results obtained by Al-Kharashi and Abu-Salem with smaller corpora that root indexing is more effective than word indexing.; Our results with phrase indexing are puzzling and suggest a need for further research: use of phrases improves the results with automatic indexing but not with manual indexing.
Keywords/Search Tags:Automatic, Indexing, Arabic, Retrieval, Information, Results, Word
Related items