Font Size: a A A

Building probabilistic models for natural language

Posted on:1997-07-10Degree:Ph.DType:Thesis
University:Harvard UniversityCandidate:Chen, Stanley FFull Text:PDF
GTID:2468390014983386Subject:Computer Science
Abstract/Summary:
Building models of language is a central task in natural language processing. Traditionally, language has been modeled with manually-constructed grammars that describe which strings are grammatical and which are not; however, with the recent availability of massive amounts of on-line text, statistically-trained models are an attractive alternative. These models are generally probabilistic, yielding a score reflecting sentence frequency instead of a binary grammaticality judgement. Probabilistic models of language are a fundamental tool in speech recognition for resolving acoustically ambiguous utterances. For example, we prefer the transcription forbear to four bear as the former string is far more frequent in English text. Probabilistic models also have application in optical character recognition, handwriting recognition, spelling correction, part-of-speech tagging, and machine translation.; In this thesis, we investigate three problems involving the probabilistic modeling of language: smoothing n-gram models, statistical grammar induction, and bilingual sentence alignment. These three problems employ models at three different levels of language; they involve word-based, constituent-based, and sentence-based models, respectively. We describe techniques for improving the modeling of language at each of these levels, and surpass the performance of existing algorithms for each problem. We approach the three problems using three different frameworks. We relate each of these frameworks to the Bayesian paradigm, and show why each framework used was appropriate for the given problem. Finally, we show how our research addresses two central issues in probabilistic modeling: the sparse data problem and the problem of inducing hidden structure.
Keywords/Search Tags:Models, Language, Probabilistic, Problem
Related items