Micro-AIRS: A microcomputer-based Arabic information retrieval system comparing words, stems, and roots as index terms | Posted on:1992-04-16 | Degree:Ph.D | Type:Dissertation | University:Illinois Institute of Technology | Candidate:Al-Kharashi, Ibrahim A | Full Text:PDF | GTID:1478390014499779 | Subject:Computer Science | Abstract/Summary: | PDF Full Text Request | Experimentation with retrieval systems in Arabic language environments has been very limited. Arabization of available information retrieval systems has dealt mostly with internal representation of the Arabic data and translation of menus and system messages to Arabic. The problems of working with the Arabic language have not been confronted directly.; Stemming algorithms have been widely used to enhance the retrieval behavior of information retrieval systems. In English based systems, stemming algorithms deal with the removal of suffixes to reduce the storage needed for the keyword list and to increase the recall factor by conflating word variants. In the Arabic language, both prefixes and suffixes are added to roots and stems to form related words. The number of affixes used in the Arabic language exceeds that used in English. Surface affix removal processes produce word stems while deep affix removal processes produce word root.; This research studies the effect of using words, stems, and roots of Arabic words as index terms on the effectiveness of the retrieval of Arabic bibliographic records. To run the experiment for these three different retrieval methods we used 355 Arabic bibliographic records covering computer and information science, and 29 queries. The test was conducted on an IBM/AT compatible microcomputer using the Microcomputer-based Arabic Information Retrieval System, Micro-AIRS.; The effectiveness of the system using word, stem, and root retrieval methods are presented using the recall and precision measures along with two nonparametric statistical tests. The system evaluation results shows the superiority of the root retrieval method over the word retrieval method, and over the stem retrieval method at high recall levels. It also shows the superiority of stem retrieval method over the word retrieval method at all recall levels. The experiments with ranking methods using dice, cosine, and Jaccard similarity coefficients shows that all three similarity coefficients produce exactly the same results when applied to a binary weighted word counts. | Keywords/Search Tags: | Retrieval, Arabic, Word, System, Stems, Root | PDF Full Text Request | Related items |
| |
|