Using Latent Information for Natural Language Processing Tasks

Posted on:2014-05-10

Degree:Ph.D

Type:Dissertation

University:University of Rochester

Candidate:Chung, Tagyoung

Full Text:PDF

GTID:1458390005483196

Subject:Computer Science

Abstract/Summary:

In a broad sense, latent information in natural language processing tasks refers to any information that is not plainly observable from raw data. Such latent information is found in abundance in many natural language processing tasks. Learning latent information itself could be the purpose of the task or it can be learned and utilized to improve relevant tasks. For example, in unsupervised learning of word alignment from parallel corpora, learning latent information is the task. Learning latent annotation for context free grammar falls into the latter category since latent annotation leads to better parsing accuracy. Depending on the availability of the data, latent information may be learned in a supervised manner or an unsupervised manner.;This dissertation presents three different types of latent information that are learned and used to improve various natural language processing tasks, mainly focusing on different stages of machine translation. First, we discuss unsupervised learning of tokenization from parallel corpora using alignment between a bilingual sentence pair as latent information. Second, we examine using empty categories to improve parsing and machine translation. In these tasks, empty categories are latent information that are learned from raw text and applied to the respective tasks. Finally, we look at learning latent annotation for synchronous context free grammar, which leads us to more accurate and faster string-to-tree machine translation.

Keywords/Search Tags:

Latent, Natural language processing tasks, Context free grammar, Machine translation

Related items

1	Intelligent Machine Translation In The Context Of Information Processing
2	The use of context-free grammar in support of Slovak-English rule-based machine translation
3	Research On The Representation And Application Of Chinese Context In Chinese-English Machine Translation
4	For Inflected Language Context-sensitive Lexical Analyzer
5	Based On The Generalization Of The Instances Of Machine Translation
6	Fuzzy Approximation Of The Grammar
7	Evaluating grammar formalisms for applications to natural language processing and biological sequence analysis
8	The Methodology And Implementation Of Chinese Natural Language Query In Databases
9	Parallel Sequence Decoding In Neural Machine Translation
10	Research On Ontology And Rule Based System Model For Controlled Natural Language