Font Size: a A A

Well-foundedness and reliability in statistical natural language parsing

Posted on:2001-09-06Degree:Ph.DType:Thesis
University:The University of RochesterCandidate:Seagull, Amon BFull Text:PDF
GTID:2468390014958249Subject:Computer Science
Abstract/Summary:
Statistical techniques have revolutionized all areas of Natural Language Processing, and syntactic parsing is no exception. The availability of large syntactically annotated corpora (principally through the Penn Treebank project) has precipitated parsing's shift from the task of constructing interpretations to the task of constructing a labeled bracketing.; These corpus-based techniques are robust and scalable, two desiderata lacking in early, knowledge-based approaches to parsing. The early approaches are typified by parsers that could operate only in a narrow domain, but that produced semantically interpretable parses. In contrast, the corpus-based approaches produce underspecified labeled bracketings that are not sufficiently detailed for applications in Natural Language Understanding.; In this dissertation we describe a parser that uses hand-written, linguistically informed knowledge sources (grammar, lexicon, ontology) to enrich the labeled bracketing in the Penn Treebank. The enriched corpus is then used as the data source for statistical parsing in our well-founded framework. Furthermore, parsing in this framework supports a fully-lexicalized parsing model, and allows for the natural integration of word sense disambiguation with syntactic disambiguation. We show that jointly modeling word sense ambiguity and syntactic ambiguity results in improved syntactic disambiguation. We also describe our treatment of coordinated structures (a topic generally ignored in statistical parsing), and our novel method for using an ontology to settle on backed-off estimators via hypothesis testing.
Keywords/Search Tags:Parsing, Natural language, Statistical, Syntactic
Related items