Font Size: a A A

Using Latent Information for Natural Language Processing Tasks

Posted on:2014-05-10Degree:Ph.DType:Dissertation
University:University of RochesterCandidate:Chung, TagyoungFull Text:PDF
GTID:1458390005483196Subject:Computer Science
Abstract/Summary:
In a broad sense, latent information in natural language processing tasks refers to any information that is not plainly observable from raw data. Such latent information is found in abundance in many natural language processing tasks. Learning latent information itself could be the purpose of the task or it can be learned and utilized to improve relevant tasks. For example, in unsupervised learning of word alignment from parallel corpora, learning latent information is the task. Learning latent annotation for context free grammar falls into the latter category since latent annotation leads to better parsing accuracy. Depending on the availability of the data, latent information may be learned in a supervised manner or an unsupervised manner.;This dissertation presents three different types of latent information that are learned and used to improve various natural language processing tasks, mainly focusing on different stages of machine translation. First, we discuss unsupervised learning of tokenization from parallel corpora using alignment between a bilingual sentence pair as latent information. Second, we examine using empty categories to improve parsing and machine translation. In these tasks, empty categories are latent information that are learned from raw text and applied to the respective tasks. Finally, we look at learning latent annotation for synchronous context free grammar, which leads us to more accurate and faster string-to-tree machine translation.
Keywords/Search Tags:Latent, Natural language processing tasks, Context free grammar, Machine translation
Related items