Chinese Word Found Its Part Of Speech Tagging

Posted on:2009-11-03

Degree:Master

Type:Thesis

Country:China

Candidate:H Yang

Full Text:PDF

GTID:2208360272989619

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of society and economy, Chinese language has been enriched and developed. More and more new words keep emerging, which brings more challenges into Chinese word segmentation task. The unrecognized new words can result in too many sequences of single characters in the segmented sentence, which decreases the segmentation precision to a remarkable extent. Therefore, the new word discovery has become a difficult problem and a bottleneck in Chinese segmentation task and how to discover the new words has became an important research field. Part-of-speech (POS) is an important attribute of words and the main bridge that connects the word with the syntax. Therefore, POS tagging should provide high-quality intermediate result for the post process of nature language processing (NLP), but the emergence of new words reduce the POS tagging performance to a certain extent.Currently, many researchers are working on the new word discovery problem and have presented kinds of approaches. However, its new words are limited to the domain or features are limited to the frequency of new words. In this paper, we first review previous work and propose a SVM-based hybrid method for new word discovery, trying to integrate the advantages of the statistics-based method and the rule based method to improve the performance of the new word discovery and POS tagging. In the statistics module, new word discovery is defined as a binary classification problem, in which we considered the previous new words features which focus on the inner feature of the word and proposed context information, as well as constraints, which reveal the relationships among the new word candidates. And some rules are introduced aimed to improve the performance. Finally, we assigned POS tagging for the new words.This paper designs and constructs a system, which implements new worddiscovery and POS tagging. Some key techniques are also illustrated in the paper.1. In the research of new word discovery, support vector machine (SVM) isintroduced to solve the classification. SVM has been successfully applied inpattern recognition and classification and SVM can find an optimal separatinghyper plane between data points of different classes in a high dimension space.And in the frame of SVM, some rules are introduced to complement the shortageof statistics-based method to improve the performance. The SVM based hybridmethod for new word discovery and its brief processing flow are described in thispaper. 2. In the research of new word POS tagging, we also define it as classification problem and deal with it with SVM, which considered the inner structure and external concatenation information. Finally, we transform a multi-class classification problem into a binary classification problem by construct a new mapping function.Finally, according to the experiment that are conducted on a one-month news of year 1998 from the People's Daily as, the precision of new word discovery we achieved is up to 60.81%, while the recall is 68.94, and the F-measure is 64.62. The precision of POS tagging is up to 90%.

Keywords/Search Tags:

New Word Discovery, Part-of-Speech (POS) tagging, Natural Language Processing (NLP), Support Vector Machine (SVM)

PDF Full Text Request

Related items

1	Study On Disambiguation Algorithm For Chinese Word Segmentation
2	Knowledge Discovery Of Gene Ontology Based On Part-of-speech Tagging And Classification Algorithm
3	The Research Of Applying Conditional Random Fields To Chinese Word Segmentation And Part-Of-Speech Tagging
4	Research On Chinese Part-of-speech Tagging Based On Semi Hidden Markov Model
5	The Study And Analysis Of Oracle Bone Inscriptions Based On Statistical Natural Language Processing
6	Word Segmentation And Pos Tagging In Chinese
7	Research On Parallel Corpora-based Unsupervised Part-of-speech Tagging For Chinese
8	Study Of Kazak Part-of-Speech Tagging Based Upon HMM
9	Research On Part-of-Speech Tagging Algorithms Of Mathematical Corpus Based On Deep Learning
10	Research On Text Document Information Hiding