Font Size: a A A

Multilingual acquisition of structured information via novel relationship extraction models over diverse knowledge sources

Posted on:2010-04-15Degree:Ph.DType:Dissertation
University:The Johns Hopkins UniversityCandidate:Garera, Nikesh LuckyFull Text:PDF
GTID:1448390002477664Subject:Computer Science
Abstract/Summary:
This dissertation presents original techniques for a class of problems that can be collectively referred to as relationship extraction. This machine learning task involves extracting tuples from free text, the exemplar instantiations of which help model the target relationship. A wide range of relationships are explored, including semantic relationships between words, their translation equivalents in different languages and encyclopedic facts about named entities.;This dissertation explores new relationship extraction models which exploit novel knowledge sources across a diverse set of relationship types in multiple languages. It ties together extraction of diverse relationships in the classic seed-based minimally supervised framework. However, this framework has previously failed to capture information beyond local context such as transitively-derived information, domain constraints and knowledge, correlations among relationships and additional novel knowledge sources. Furthermore, the traditional seed-based learning framework fails to extract non-overt relationships such as an author's gender or age when they are not explicitly stated. In contrast, some of these non-overt relationships can be inferred with an accuracy exceeding 95% via novel document-wide, discourse-feature-based and interlocutar-sensitive models. This dissertation presents new relationship extraction methods embedding a wide range of such knowledge sources in the minimally supervised learning framework.;Collectively, these methods outperform previously published algorithms on a diverse set of natural language data sources and genres including newswire text, biographical articles, raw webpages, conversational speech transcripts and email, and on a large set of languages including Albanian, Arabic, Bulgarian, Czech, Farsi, German, Hindi, Hungarian, Russian, Slovak, Spanish and Swedish.
Keywords/Search Tags:Relationship extraction, Knowledge sources, Novel, Diverse, Information, Models
Related items