The effectiveness and efficiency of clustering in Arabic information retrieval systems | | Posted on:1999-09-21 | Degree:Ph.D | Type:Dissertation | | University:Illinois Institute of Technology | Candidate:Akkawi, Kayed Odeh | Full Text:PDF | | GTID:1468390014969319 | Subject:Computer Science | | Abstract/Summary: | PDF Full Text Request | | This dissertation explores several different approaches to clustering documents: complete-link clustering, group-average clustering, and single-link clustering. A series of experiments in information retrieval were carried out on two different corpora: a collection of 242 abstracts of gapers in computer science and a newspaper corpus of 187 articles of varying length. Each clustering method was tested three times, once using words as index terms, once using stems, once using roots.; We experimented with the use of roots, stems, and full words as index terms using the complete link clustering method. The retrieval results of these experiments revealed that using full words as index results in significantly better performance than using roots as index terms. And using roots produces significantly better results than using stems. Using the group average link clustering method we found that using full words as index terms gives significantly better results than using roots as index terms. Also, using roots as index terms gives significantly better results than using stems except at the recall level of 1.0. Using the single link clustering method we found that using full words as index terms produces significantly better results at the lower recall levels (up to 0.4) than using roots, and significantly better than using stems at the lower recall levels (up to 0.6). But, at the higher recall levels roots and stems perform significantly better than full words. | | Keywords/Search Tags: | Clustering, Stems, Full words, Using, Roots, Index terms, Recall levels, Retrieval | PDF Full Text Request | Related items |
| |
|