Font Size: a A A

Parallel information retrieval and visualization on large, unstructured document collections using web link information

Posted on:2001-07-30Degree:Ph.DType:Thesis
University:George Mason UniversityCandidate:Alford, Kenneth LowellFull Text:PDF
GTID:2468390014458181Subject:Computer Science
Abstract/Summary:PDF Full Text Request
This thesis deals with information retrieval scalability, incorporation of web links, and visualization. I prove that an information retrieval (IR) system based on a relational database management system is scalable over large document collections up to the limit of physical storage devices available. Scalability testing was performed on web-based document collections that are up to 50 times as large as previously tested collections. Testing was performed on a Sun E10K computer system using 1 to 24 processors and seven document collections ranging in size from 2 gigabytes to 100 gigabytes of unprocessed web-based text data. Hardware scalability was evaluated through runtime speed-up. Speed-up performance improvements were consistent for all seven document collections tested. Software scalability was proven by measuring improvement in scale-up performance. Scale-up was positive for three through 24 processors and across all seven document collections.; Incorporating web links during information retrieval is the second emphasis of this thesis. I prove that web link information can be added to content-only information retrieval to achieve equal or improved precision retrieval results. Using standardized TREC-7 query topics, I achieved a slight improvement in average precision over content-only information retrieval. Using TREC-8 query topics, I achieved the same average precision as content-only document retrieval. I further proved that hundreds of indirectly relevant documents—documents that link directly to a relevant document—can be retrieved without any reduction in retrieval precision.; Finally, I prove that IR web link information can be visually displayed to provide users an additional tool to find relevant documents. I developed the Web Link Visualization Tool to graphically view relationships between web linked documents, view original document text, and record judgments regarding the relevance of individual documents. The Web Link Visualization Tool is a front-end visualization program for an existing information retrieval system. It enables users to find, view, and evaluate directly and indirectly relevant documents based on similarities between document query vectors and query term vectors for both originally retrieved and web-linked documents.
Keywords/Search Tags:Web, Information retrieval, Link, Document, Visualization, Using, Large, Relevant
PDF Full Text Request
Related items