Generalized domain adaptation for sequence labeling in natural language processing

Posted on:2017-10-05

Degree:Ph.D

Type:Dissertation

University:Temple University

Candidate:Xiao, Min

Full Text:PDF

GTID:1468390011998701

Subject:Computer Science

Abstract/Summary:

Sequence labeling tasks have been widely studied in the natural language processing area, such as part-of-speech tagging, syntactic chunking, dependency parsing, and etc. Most of those systems are developed on a large amount of labeled training data via supervised learning. However, manually collecting labeled training data is too time-consuming and expensive. As an alternative, to alleviate the issue of label scarcity, domain adaptation has recently been proposed to train a statistical machine learning model in a target domain where there is no enough labeled training data by exploiting existing free labeled training data in a different but related source domain. The natural language processing community has witnessed the success of domain adaptation in a variety of sequence labeling tasks.;Though the labeled training data in the source domain are available and free, however, they are not exactly as and can be very different from the test data in the target domain. Thus, simply applying naive supervised machine learning algorithms without considering domain differences may not fulfill the purpose. In this dissertation, we developed several novel representation learning approaches to address domain adaptation for sequence labeling in natural language processing. Those representation learning techniques aim to induce latent generalizable features to bridge domain divergence to enable cross-domain prediction.;We first tackle a semi-supervised domain adaptation scenario where the target domain has a small amount of labeled training data and propose a distributed representation learning approach based on a probabilistic neural language model. We then relax the assumption of the availability of labeled training data in the target domain and study an unsupervised domain adaptation scenario where the target domain has only unlabeled training data, and give a task-informative representation learning approach based on dynamic dependency networks. Both works are developed in the setting where different domains contain sentences in different genres. We then extend and generalize domain adaptation into a more challenging scenario where different domains contain sentences in different languages and propose two cross-lingual representation learning approaches, one is based on deep neural networks with auxiliary bilingual word pairs and the other is based on annotation projection with auxiliary parallel sentences. All four specific learning scenarios are extensively evaluated with different sequence labeling tasks. The empirical results demonstrate the effectiveness of those generalized domain adaptation techniques for sequence labeling in natural language processing.

Keywords/Search Tags:

Natural language processing, Sequence labeling, Domain, Labeled training data, Representation learning

Related items

1	Researches On Sequence Labeling Models In Natural Language Processing
2	Research On Joint Learning Of Sequence Labeling In Natural Language Processing
3	Research On Text Causality Extraction Based On Deep Learning And Sequence Labeling
4	Research Of Sequence Labeling Technics Based On Graph Models
5	Sequence Labeling: Supervised Learning And Applications
6	Research On Text Representation Model And Application In Text Classification And Natural Language Inference
7	Learning probabilistic lexicalized grammars for natural language processing
8	Modeling And Learning Of Representations For Natural Language Sentence-level Structures
9	Research On Sequence Labeling Model Of Natural Language Processing Based On Deep Learning
10	Maximizing resources for corpus-based natural language processing