Internet resource discovery: Topical clustering and visualization using latent semantic indexing

Posted on:1997-11-21

Degree:Ph.D

Type:Dissertation

University:University of Southern California

Candidate:Li, Shih-Hao

Full Text:PDF

GTID:1468390014983291

Subject:Computer Science

Abstract/Summary:

As the number of servers grows rapidly, it becomes difficult to search information in the Internet. To broadcast requests to all servers will overwhelm the underlying networks. Moreover, most requests are sent to irrelevant servers.; To determine relevant servers for user queries, we propose the client-directory-server model. In this model, a user sends a query to the "directory", which ranks servers based on their relevance to the query. Users are allowed to search information containing exact keywords or embedded concepts.; To search information by exact keywords, we propose a new Boolean similarity measure to rank servers with respect to Boolean queries. In contrast with other known method, our method reduces time and space complexity from exponential to polynomial in the number of Boolean terms. To search information by conceptual meanings, we integrate latent semantic indexing and hierarchic agglomerative clustering methods. We cluster objects based on their conceptual meanings and arrange them in a hierarchic structure to reduce searching time. In addition, we develop a new visualization scheme which displays the relationships between query terms and documents in a two-dimensional space.; In this research, we describe our proposed methods and a prototype user interface Vintage. We conduct experiments on the USC Homer database and four standard document collections, CACM, CISI, CRAN, and MED, for which queries and relevant judgments are available. We compare our performance with existing methods and obtain better results in precision, recall, and space and time complexity.

Keywords/Search Tags:

Search information, Servers

Related items

1	Research On Semantic Search Scheme Based On Content-aware Over Encrypted Cloud Data
2	Enterprise Multi-platform Design And Implementation Of Server Monitoring
3	Social Events And Reasonable Regulation Problems Caused By Domestic Search Engines In Information Dissemination
4	Design of a workflow management system for BLAST servers using Jini and JavaSpaces
5	Servers/Web sites security (French text)
6	Digital Library Information Search Key Technologies
7	Research On Web Information Search Behavior Of Children
8	Resource management in distributed continuous media servers
9	Web Tender Information Search And Management System Design
10	Research On The Development Of Search Engine From The Perspective Of Communication