Robust knowledge extraction over large text collections

Posted on:2006-06-11

Degree:Ph.D

Type:Dissertation

University:Drexel University

Candidate:Song, Min

Full Text:PDF

GTID:1458390008953874

Subject:Information Science

Abstract/Summary:

Automatic knowledge extraction over large text collections has been a challenging task due to many constraints such as needs of large annotated training data, requirement of extensive manual processing of data, and huge amount of domain-specific terms. In order to address these constraints, this study proposes and develops a complete solution for extracting knowledge from large text collections with minimum human intervention. As a testbed system, a novel robust and quality knowledge extraction system, called RIKE, has been developed. The following three research questions are examined to evaluate RIKE: (1) How accurately does RIKE retrieve the promising documents for information extraction from huge text collections such as MEDLINE or TREC? (2) Does ontology enhance extraction accuracy of RIKE in retrieving the promising documents? (3) How well does RIKE extract the target entities from a huge medical text collection, MEDLINE?; The major contributions of this study are (1) an automatic unsupervised query generation for effective retrieval from text databases is proposed and evaluated, (2) Mixture Hidden Markov models for automatic instances extraction are proposed and tested, (3) Three Ontology-driven query expansion algorithms are proposed and evaluated, and (4) Object-oriented methodologies for knowledge extraction system are adopted. Through extensive experiments, RIKE is proved to be a robust and quality knowledge extraction technique. DocSpotter outperforms other leading techniques for retrieving promising documents for extraction from 15.5% to 35.34% in P 20. HiMMIE improves extraction accuracy from 9.43% to 24.67% in F-measures.

Keywords/Search Tags:

Extraction, Text collections, Large text, RIKE, Robust

Related items

1	Extracting relations from large text collections
2	Self-organising text collections with adaptive resonance theory neural networks
3	Learning-Based Text Extraction In Natural Background
4	Design And Implementation Of Text Information Extraction On Smart Phone
5	Discovering latent topical phrases in document collections and networks with text components: Leveraging text mining and information network analysis for human oriented applications
6	Information extraction to enable faceted search over large text document collections
7	Reasearch On Video Text Information Extraction Based On Features Integration
8	High-performance, open-domain question answering from large text collections
9	Research On Keyword Extraction Technology Oriented To Conversational Text
10	Text Extraction In Video