Neural Domain Adaptive Chinese Word Segmentation Algorithm

Posted on:2019-02-21

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Bao

Full Text:PDF

GTID:2348330542998690

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Chinese is different from languages like English.Chinese texts are continuously written without word delimiters such as space.Computers need to segment the Chinese texts.Chinese word segmentation is one of the basic tasks of Chinese natural language processing.The performance of Chinese word segmentation system greatly affects the performance of upper-level tasks and plays a very important role in automatic Chinese natural language processing.In the past decades,many large Chinese word segmentation annotation datasets have been established and the Chinese word segmentation algorithm has been improved continuously.From the traditional feature-based segmentation model to the neural network,segmentation systems achieved a high F1 score of over 0.95.However,as the manual annotated data mainly focus on newswires,researchers found that the models trained on these annotated corpora suffer performance degradation in other domains.This problem is well known as domain adaptation.This paper studies neural Chinese word segmentation and its domain adaptation.The main contributions are as follows:(1)For neural Chinese word segmentation model,we propose a combined model of convolutional and recurrent neural network.We introduce the convolutional neural network with multi-convolution kernel to extract the hidden multi-scale features in the sentence.At the same time,we combined the convolutional network and the recurrent network,and the k-max pooling is added to reduce the complexity of the whole model.Experiments on three public datasets show that our combined network achieves a better performance compared with previous work.(2)For the task of semi-supervised domain adaptation of Chinese word segmentation,we explored the differences between the Chinese corpora from different domains and proposed three semi-supervised domain adaptation strategies based on the character language model.Specifically,after counting the uni-gram and bi-gram on Chinese corpora from different domains,we find that the differences between the different Chinese copora are mainly reflected in the combination of characters.Therefore,we propose to use character-level language model to model this relationship,and three specific domain adaptation strategies are proposed.In the experiment,we compared our methods with previous semi-supervised domain adaptation methods on public datasets,and our method achieved a comparable performance with the previous dictionary based method using only unlabeled target domain data.(3)On the fully-supervised domain adaptation of Chinese word segmentation,different from the traditional regularization method,we propose a dynamic regularization strategy based on neural network.Specifically,we use the source domain segmentation model to constrain the training of target domain model.This kind of canonical constraint will control the training of the target domain model according to the probability distribution of different training samples from the source domain model.In the experiments,we achieved better performance on public datasets compared to the previous Chinese word segmentation fully-supervised methods.Our method achieves a similar performance to previous models using less annotated data.

Keywords/Search Tags:

Chinese word segmentation, neural network, domain adaptation

PDF Full Text Request

Related items

1	Research On Chinese Word Segmentation For Domain Literature
2	Neural Domain Adaptive Chinese Word Segmentation Algorithm
3	Research On Domain Adaptation For Chinese Word Segmentation Based On Parameter Transfer Learning
4	Research On Domain Adaptation Method For Chinese Segmentation Based On Instance Transfer Learning
5	Research On Domain Adaptation Of Chinese Word Segmentation With Multi-source Features And Data
6	Neural Networks Incorporating Multiple Target Domain Information For Cross-domain Chinese Word Segmentation
7	An Incremental-styled Learning Chinese Word Segmentation System Based On Perceptron Algorithm Design And Implementation
8	Research On Chinese Word Segmentation Based On Neural Network
9	Research On The Learning Of Integrating Chinese Word Segmentation With Part-of-Speech Tagging And Domain Adaption Approach
10	Chinese Word Auto-segmentation Design And Algorithm Realization For Chinese Network Information Retrieval