Font Size: a A A

Research On The Learning Of Integrating Chinese Word Segmentation With Part-of-Speech Tagging And Domain Adaption Approach

Posted on:2011-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhuFull Text:PDF
GTID:2348330482457346Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The ultimate goal of research on Natural Language Processing is to parse and understand language. But we are still far from achieving this goal. For this reason, much research in NLP has focused on intermediate tasks that make sense of some of the structure inherent in language without requiring complete understanding. One such task is part-of-speech tagging, or simply tagging. Tagging is the task of labeling each word in a sentence with its appropriate part of speech. After all, the integration of word segmentation and Part-Of-Speech tagging is the pre-process amount natural language processing tasks play an important role.At present the speech tagging strategy is generally taken after word segmentation at home and abroad. The disadvantages of this strategy are:first is error propagation; second, we can make use of part of speech information to the segmentation tasks, we want to draw on part of speech information so that our accuracy will be higher. So in order to achieve better performance, the task of Integrating Chinese Word Segmentation with Part-of-Speech Tagging is praised up.Traditional statistical methods generally assume that the training corpus and test corpus have the same probability distribution, but in fact the training corpus and test corpus often come from different fields, the differences in the distribution between source domain and target domain often lead to classification performance decreased significantly. Now the tagging corpus is lacking apparently, we look forward to an existing annotation classifier that training library can well adapt to another domain. In other words, we hope that the classifier that we have trained show off better domain adaptation. This is called field domain adaption.The main contributions include:(1)As the current text pre-process shortcomings, we propose a word segmentation and part-of-speech tagging integration learning model. We use conditional random fields tool, and it performed better comparing to word segmentation and POS tagging alone.(2)We propose a improved evaluation method, it solved the word alignment problem to some extent, provided a more comprehensive comparative evaluation for POS tagging systems and integration methods.(3)Considering the fields of corpus for text annotation coverage and poor accuracy, we have tried two methods to solve this problem. For the task of multi-source domain adaption,we have independently designed and implemented the selective voting algorithm. Relying on the optimal allocation of resources, we have achieved better performance.
Keywords/Search Tags:word segmentation, Part-Of-Speech tagging, domain adaptation, conditional random fields
PDF Full Text Request
Related items