Font Size: a A A

Internet resource discovery: Topical clustering and visualization using latent semantic indexing

Posted on:1997-11-21Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Li, Shih-HaoFull Text:PDF
GTID:1468390014983291Subject:Computer Science
Abstract/Summary:
As the number of servers grows rapidly, it becomes difficult to search information in the Internet. To broadcast requests to all servers will overwhelm the underlying networks. Moreover, most requests are sent to irrelevant servers.; To determine relevant servers for user queries, we propose the client-directory-server model. In this model, a user sends a query to the "directory", which ranks servers based on their relevance to the query. Users are allowed to search information containing exact keywords or embedded concepts.; To search information by exact keywords, we propose a new Boolean similarity measure to rank servers with respect to Boolean queries. In contrast with other known method, our method reduces time and space complexity from exponential to polynomial in the number of Boolean terms. To search information by conceptual meanings, we integrate latent semantic indexing and hierarchic agglomerative clustering methods. We cluster objects based on their conceptual meanings and arrange them in a hierarchic structure to reduce searching time. In addition, we develop a new visualization scheme which displays the relationships between query terms and documents in a two-dimensional space.; In this research, we describe our proposed methods and a prototype user interface Vintage. We conduct experiments on the USC Homer database and four standard document collections, CACM, CISI, CRAN, and MED, for which queries and relevant judgments are available. We compare our performance with existing methods and obtain better results in precision, recall, and space and time complexity.
Keywords/Search Tags:Search information, Servers
Related items