Research On The Learning Of Integrating Chinese Word Segmentation With Part-of-Speech Tagging And Domain Adaption Approach

Posted on:2011-02-16

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhu

Full Text:PDF

GTID:2348330482457346

Subject:Computer application technology

Abstract/Summary:

The ultimate goal of research on Natural Language Processing is to parse and understand language. But we are still far from achieving this goal. For this reason, much research in NLP has focused on intermediate tasks that make sense of some of the structure inherent in language without requiring complete understanding. One such task is part-of-speech tagging, or simply tagging. Tagging is the task of labeling each word in a sentence with its appropriate part of speech. After all, the integration of word segmentation and Part-Of-Speech tagging is the pre-process amount natural language processing tasks play an important role.At present the speech tagging strategy is generally taken after word segmentation at home and abroad. The disadvantages of this strategy are:first is error propagation; second, we can make use of part of speech information to the segmentation tasks, we want to draw on part of speech information so that our accuracy will be higher. So in order to achieve better performance, the task of Integrating Chinese Word Segmentation with Part-of-Speech Tagging is praised up.Traditional statistical methods generally assume that the training corpus and test corpus have the same probability distribution, but in fact the training corpus and test corpus often come from different fields, the differences in the distribution between source domain and target domain often lead to classification performance decreased significantly. Now the tagging corpus is lacking apparently, we look forward to an existing annotation classifier that training library can well adapt to another domain. In other words, we hope that the classifier that we have trained show off better domain adaptation. This is called field domain adaption.The main contributions include:(1)As the current text pre-process shortcomings, we propose a word segmentation and part-of-speech tagging integration learning model. We use conditional random fields tool, and it performed better comparing to word segmentation and POS tagging alone.(2)We propose a improved evaluation method, it solved the word alignment problem to some extent, provided a more comprehensive comparative evaluation for POS tagging systems and integration methods.(3)Considering the fields of corpus for text annotation coverage and poor accuracy, we have tried two methods to solve this problem. For the task of multi-source domain adaption,we have independently designed and implemented the selective voting algorithm. Relying on the optimal allocation of resources, we have achieved better performance.

Keywords/Search Tags:

word segmentation, Part-Of-Speech tagging, domain adaptation, conditional random fields

Related items

1	The Research Of Applying Conditional Random Fields To Chinese Word Segmentation And Part-Of-Speech Tagging
2	The Effect Of Part Of Speech On Chinese Word Segmentation
3	Application Research On Chinese Named Entity Recognition Based On Domain Ontology
4	Research On Parallel Corpora-based Unsupervised Part-of-speech Tagging For Chinese
5	Research On Chinese Word Segmentation And Part-of-speech Tagging Based On Deep Learning Methods
6	Research Of Chinese Word Segmentation With Conditional Random Fields
7	Research On Laodian Participle And Part-of-speech Tagging Method
8	Research Of Chinese Word Segmentation Based On Mechanical Matching And Character Tagging
9	Research And System Implementation Of Chinese Word Segmentation In Specialized Fields Based On Conditional Random Fields
10	Research And Application Of Chinese Word Segmentation Based On Conditional Random Fields