Font Size: a A A

Extractors for digital library objects

Posted on:1998-05-29Degree:Ph.DType:Dissertation
University:Rutgers The State University of New Jersey - NewarkCandidate:Holowczak, Richard DeanFull Text:PDF
GTID:1468390014978396Subject:Computer Science
Abstract/Summary:
The promise of Digital Libraries (DLs) is to make large collections of distributed, multimedia documents broadly available to the public in digital form. Information in a digital library must be indexed to provide users with effective ways of finding information. Human indexers either can't keep up with the influx of digital information or are inconsistent in the indexing task. Therefore, we need a uniform and automated mechanism to build meaningful and expressive indexes with which expert and novice users can interact. We call such mechanisms "extractors." We must also cater to a wide range of user's specialties and abilities by providing extensible and customizable systems.; Providing such an expressive representation of documents is a three step process: First, we model a domain as a hierarchy of classes, where each class contains a set of concepts. Given this hierarchy, we apply a trainable information extraction system for each class. The output from this step is a set of concept definitions that describe the linguistic context in which concepts are found in documents. These definitions are applied to novel documents to perform the classification task. The output of the classification step is a set of instantiated definitions that provide us with a conceptual index of the novel document. This index forms the foundation of a user-extensible information retrieval system.; In experiments, our classification systems exhibit high recall and precision, and our information retrieval systems provide an intuitive relevance feedback mechanism that allows users to achieve a precise result set of documents. Our primary contributions are a semi-automated, incremental modeling methodology used to build extractors for a given domain, and a methodology for constructing classification systems with a variety of potential applications including information retrieval.
Keywords/Search Tags:Digital, Information, Documents, Extractors, Systems, Classification
Related items