Extractors for digital library objects

Posted on:1998-05-29

Degree:Ph.D

Type:Dissertation

University:Rutgers The State University of New Jersey - Newark

Candidate:Holowczak, Richard Dean

Full Text:PDF

GTID:1468390014978396

Subject:Computer Science

Abstract/Summary:

The promise of Digital Libraries (DLs) is to make large collections of distributed, multimedia documents broadly available to the public in digital form. Information in a digital library must be indexed to provide users with effective ways of finding information. Human indexers either can't keep up with the influx of digital information or are inconsistent in the indexing task. Therefore, we need a uniform and automated mechanism to build meaningful and expressive indexes with which expert and novice users can interact. We call such mechanisms "extractors." We must also cater to a wide range of user's specialties and abilities by providing extensible and customizable systems.; Providing such an expressive representation of documents is a three step process: First, we model a domain as a hierarchy of classes, where each class contains a set of concepts. Given this hierarchy, we apply a trainable information extraction system for each class. The output from this step is a set of concept definitions that describe the linguistic context in which concepts are found in documents. These definitions are applied to novel documents to perform the classification task. The output of the classification step is a set of instantiated definitions that provide us with a conceptual index of the novel document. This index forms the foundation of a user-extensible information retrieval system.; In experiments, our classification systems exhibit high recall and precision, and our information retrieval systems provide an intuitive relevance feedback mechanism that allows users to achieve a precise result set of documents. Our primary contributions are a semi-automated, incremental modeling methodology used to build extractors for a given domain, and a methodology for constructing classification systems with a variety of potential applications including information retrieval.

Keywords/Search Tags:

Digital, Information, Documents, Extractors, Systems, Classification

Related items

1	Schemas de classification et reperage des documents administratifs electroniques dans un contexte de gestion decentralisee des ressources informationnelles
2	Randomness extractors for independent sources and applications
3	Digital documents in the workplace: An empirical investigation of document reuse and information technology infrastructure
4	Understanding Clinician Information Demands and Synthesis of Clinical Documents in Electronic Health Record Systems
5	A Research On Automatic WEB Documents Extraction And Classification
6	Automating Derivative Classification in Multi-Level Secure Documents
7	Research Of Digital Signature Technology Based On Chaotic System On Network
8	Deterministic extractors
9	A Research On BLM Documents Information Management Of Buliding Project Based On PBS
10	Automatic Classification Of Various Types Of Documents Based On Wikipedia