Leveraging knowledge of document structure and named entities for information extraction

Posted on:2006-08-06

Degree:M.S

Type:Thesis

University:Case Western Reserve University

Candidate:Duncan, Frank Bissett

Full Text:PDF

GTID:2458390008451287

Subject:Computer Science

Abstract/Summary:

We present an end-to-end process for extracting key information from online and offline documents that define the body of literature of a given domain. The ultimate goal of implementing such a process is to identify leading authorities in the field, such as authors, publications and institutions. This process is comprised of a number of stages, including: defining the domain, identifying and acquiring the data sources, which includes processing and extracting the information. Perhaps the most critical portion of the process is extracting the information from the texts and mapping it to an analytical data model. We will demonstrate how this can be facilitated by examining the structure of the document, expressed explicitly in HTML or implicitly in the structure of scholarly literature. We will also demonstrate the necessity of a database to identify a named entity, and a set of heuristics to use this database effectively.

Keywords/Search Tags:

Information, Structure, Process

Related items

1	Leveraging knowledge of document structure and named entities for information extraction
2	Based On A Common Framework Of The Research And Application Of Integrated Capp System
3	Study Of The Product-Oriented Process Information System
4	Information Management System For Mold Manufacturing Process Research And Development
5	A Difference Detection Algorithm For Process Models Based On Tree Structure
6	The Design And Realization Of Vaccine Process Quality Management Information System Based On B/S Structure
7	Three-tier Architecture-based Web Search And Information Processing Systems
8	Research On The Connotation Structure Model And Complex Characteristics Of Personal Information Security System
9	Supporting Structure And Behavior Fusion Process Model Indexing And Retrieval
10	Process Structure Remodeling Oriented Database Log Analysis System Design And Implementation