Font Size: a A A

Chinese Word Found Its Part Of Speech Tagging

Posted on:2009-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:H YangFull Text:PDF
GTID:2208360272989619Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of society and economy, Chinese language has been enriched and developed. More and more new words keep emerging, which brings more challenges into Chinese word segmentation task. The unrecognized new words can result in too many sequences of single characters in the segmented sentence, which decreases the segmentation precision to a remarkable extent. Therefore, the new word discovery has become a difficult problem and a bottleneck in Chinese segmentation task and how to discover the new words has became an important research field. Part-of-speech (POS) is an important attribute of words and the main bridge that connects the word with the syntax. Therefore, POS tagging should provide high-quality intermediate result for the post process of nature language processing (NLP), but the emergence of new words reduce the POS tagging performance to a certain extent.Currently, many researchers are working on the new word discovery problem and have presented kinds of approaches. However, its new words are limited to the domain or features are limited to the frequency of new words. In this paper, we first review previous work and propose a SVM-based hybrid method for new word discovery, trying to integrate the advantages of the statistics-based method and the rule based method to improve the performance of the new word discovery and POS tagging. In the statistics module, new word discovery is defined as a binary classification problem, in which we considered the previous new words features which focus on the inner feature of the word and proposed context information, as well as constraints, which reveal the relationships among the new word candidates. And some rules are introduced aimed to improve the performance. Finally, we assigned POS tagging for the new words.This paper designs and constructs a system, which implements new worddiscovery and POS tagging. Some key techniques are also illustrated in the paper.1. In the research of new word discovery, support vector machine (SVM) isintroduced to solve the classification. SVM has been successfully applied inpattern recognition and classification and SVM can find an optimal separatinghyper plane between data points of different classes in a high dimension space.And in the frame of SVM, some rules are introduced to complement the shortageof statistics-based method to improve the performance. The SVM based hybridmethod for new word discovery and its brief processing flow are described in thispaper. 2. In the research of new word POS tagging, we also define it as classification problem and deal with it with SVM, which considered the inner structure and external concatenation information. Finally, we transform a multi-class classification problem into a binary classification problem by construct a new mapping function.Finally, according to the experiment that are conducted on a one-month news of year 1998 from the People's Daily as, the precision of new word discovery we achieved is up to 60.81%, while the recall is 68.94, and the F-measure is 64.62. The precision of POS tagging is up to 90%.
Keywords/Search Tags:New Word Discovery, Part-of-Speech (POS) tagging, Natural Language Processing (NLP), Support Vector Machine (SVM)
PDF Full Text Request
Related items