Well-foundedness and reliability in statistical natural language parsing

Posted on:2001-09-06

Degree:Ph.D

Type:Thesis

University:The University of Rochester

Candidate:Seagull, Amon B

Full Text:PDF

GTID:2468390014958249

Subject:Computer Science

Abstract/Summary:

Statistical techniques have revolutionized all areas of Natural Language Processing, and syntactic parsing is no exception. The availability of large syntactically annotated corpora (principally through the Penn Treebank project) has precipitated parsing's shift from the task of constructing interpretations to the task of constructing a labeled bracketing.; These corpus-based techniques are robust and scalable, two desiderata lacking in early, knowledge-based approaches to parsing. The early approaches are typified by parsers that could operate only in a narrow domain, but that produced semantically interpretable parses. In contrast, the corpus-based approaches produce underspecified labeled bracketings that are not sufficiently detailed for applications in Natural Language Understanding.; In this dissertation we describe a parser that uses hand-written, linguistically informed knowledge sources (grammar, lexicon, ontology) to enrich the labeled bracketing in the Penn Treebank. The enriched corpus is then used as the data source for statistical parsing in our well-founded framework. Furthermore, parsing in this framework supports a fully-lexicalized parsing model, and allows for the natural integration of word sense disambiguation with syntactic disambiguation. We show that jointly modeling word sense ambiguity and syntactic ambiguity results in improved syntactic disambiguation. We also describe our treatment of coordinated structures (a topic generally ignored in statistical parsing), and our novel method for using an ontology to settle on backed-off estimators via hypothesis testing.

Keywords/Search Tags:

Parsing, Natural language, Statistical, Syntactic

Related items

1	Research On Natural Language Syntactic Parsing Based On Deep Learning
2	Research On Chinese Syntactic Parsing Based On SEARN Algorithm
3	Research On Joint Syntactic And Semantic Parsing For Chinese
4	A Study On The Computation Of Chinese Chunks
5	Research On Chinese Parsing Based On Semantic Analysis And Its Implementation
6	Learning for semantic parsing and natural language generation using statistical machine translation techniques
7	Combining labeled and unlabeled data in statistical natural language parsing
8	Any domain parsing: Automatic domain adaptation for natural language parsing
9	Natural language parsing as statistical pattern recognition
10	Research On History-based Chinese Hierarchical Parsing