Open information extraction for the Web

Posted on:2010-04-07

Degree:Ph.D

Type:Thesis

University:University of Washington

Candidate:Banko, Michele

Full Text:PDF

GTID:2448390002979891

Subject:Computer Science

Abstract/Summary:

The World Wide Web contains a significant amount of information expressed using natural language. While unstructured text is often difficult for machines to understand, the field of Information Extraction (IE) offers a way to map textual content into a structured knowledge base. The ability to amass vast quantities of information from Web pages has the potential to increase the power with which a modern search engine can answer complex queries.;IE has traditionally focused on acquiring knowledge about particular relationships within a small collection of domain-specific text. Typically, a target relation is provided to the system as input along with extraction patterns or examples that have been specified by hand. Shifting to a new relation requires a person to create new patterns or examples. This manual labor scales linearly with the number of relations of interest.;The task of extracting information from the Web presents several challenges for existing IE systems. The Web is large and heterogeneous; the number of potentially interesting relations is massive and their identity often unknown. To enable large-scale knowledge acquisition from the Web, this thesis presents Open Information Extraction, a novel extraction paradigm that automatically discovers thousands of relations from unstructured text and readily scales to the size and diversity of the Web.

Keywords/Search Tags:

Web, Information, Text

Related items

1	The Research And Implementation Of Text Classification Based On Meta-Information And Optimization
2	The Research And Implementation Of Text Classification Based On Meta-information And Optimization
3	Design Of Information Hiding Algorithms Based On Text
4	Research On The Key Techniques Of Web Information Intelligent Acquisition
5	Researching Text Classification Using Semantic And Sequence Information
6	Information Filtering Systems Based On Web Text Content And Design,
7	The Research On Several Key Techniques In Text Information Processing
8	Research On The Technology Of Video Text Information Extraction
9	Study On Finding Dimensions For Text Based Oh Weighted Heterogeneous Information Networke
10	Reasearch On Video Text Information Extraction Based On Features Integration