Font Size: a A A

Towards efficient statistical parsing using lexicalized grammatical information

Posted on:2003-10-21Degree:Ph.DType:Dissertation
University:University of DelawareCandidate:Chen, JohnFull Text:PDF
GTID:1468390011980270Subject:Computer Science
Abstract/Summary:
Many natural language understanding systems require efficient and accurate parsing disambiguation to be effective. State of the art parsers owe their high performance in large part to statistical modeling of lexical features. Although lexicalized tree adjoining grammar (TAG) is a lexicalized grammatical formalism for natural language, its use in statistical parsing has remained relatively unexplored. In this work, I aim to develop statistical models for TAG parsing that are both efficient and accurate. First, I explore the issue of linear time TAG parsing disambiguation (supertagging). Previously, only local structural information was found to be effective for supertag disambiguation. I show that long distance information as well as lexical information can also be useful for accurate supertagging. Furthermore, I develop frameworks that use these features to significantly increase the accuracy of supertagging. Second, in order to provide a robust resource for statistical processing models of TAG, I develop and evaluate procedure to extract TAGS from widely available treebanks. I then develop other procedures to organize these extracted TAGS as well as to link them to other TAGs. Third, I explore smoothing approaches for TAG, which is essential because of the inherent data sparseness problem for statistical processing models of TAG. One main approach uses the idea of distributional similarity in smoothing while another approach uses the large scale organization of TAG for smoothing. Both show promise for smoothing statistical processing models of TAG.
Keywords/Search Tags:Statistical, TAG, Parsing, Efficient, Information, Lexicalized, Smoothing
Related items