Font Size: a A A

Information organization algorithms and applications

Posted on:2002-05-15Degree:Ph.DType:Dissertation
University:Dartmouth CollegeCandidate:Pelekhov, EkaterinaFull Text:PDF
GTID:1468390011496454Subject:Computer Science
Abstract/Summary:
We consider a problem of information overload: the conflict between vast amounts of data available in electronic form and inadequate tools for finding relevant information. To alleviate this problem we look at the information organization: organizing electronic documents as a hierarchy of topics and subtopics.; We developed a clustering algorithm, called star algorithm, for organizing both static and dynamic document collections. This algorithm computes clusters induced by natural topic structure of the space, and does not impose the constraint to use a fixed number of clusters. As a result, the algorithm guarantees a lower bound on the topic similarity, defined in terms of the standard vector space model, between the documents in each cluster. The star clustering algorithm is based on covering graphs with dense subgraphs that are star-shaped. We developed efficient variants of the star algorithms for dynamic and hierarchical organization. We demonstrate the effectiveness and the efficiency of the star algorithm against two popular clustering algorithms, the single-link algorithm and the average-link algorithm.; We use this imposed document organization as a front-end to a search engine, organizing documents retrieved in response to a query, and allowing user to quickly browse and identify relevant and irrelevant categories. We developed a method for visualizing the topic structure of the retrieved document set.; In dynamic collections, such as news wires, we use information organization to identify and track interesting stories. We developed an algorithm for document filtering based on document membership in the star clustering. We evaluate this algorithm against filtering around a topic centroid, and filtering based on a single cluster produced by the star, the single-link and the average-link clustering algorithms. We introduce the persistent query system, an interactive document filtering system which organizes documents retrieved in response to a query, and allows the user to formulate and maintain a persistent query as a subset of relevant clusters. The new documents added to the, relevant clusters are pushed to the user. We implement the persistent queries in the mobile-agent document retrieval environment, the Serval system.
Keywords/Search Tags:Algorithm, Information, Document, Clusters, /italic
Related items