Font Size: a A A

The use of morphological knowledge in Chinese natural language processing

Posted on:2009-01-25Degree:Ph.DType:Dissertation
University:University of Colorado at BoulderCandidate:Tseng, Hui-HsinFull Text:PDF
GTID:1448390002991065Subject:Language
Abstract/Summary:
Chinese words are typically formed by four morphological processes: compounding, affixation, idiomization, and reduplication. In this dissertation, I demonstrate how knowledge of these morphological elements in Chinese can be used to improve the performance of four natural language processing tasks. (i) Chinese word segmentation, (ii) semantic classification of unknown Chinese words, (iii) Chinese part-of-speech tagging, and (iv) Chinese-to-English machine translation. For the task of Chinese word segmentation, I show that adding morphological features to a conditional random field sequence model results in excellent performance, as evaluated in the Sighan 2005 Bakeoff. For the task of assigning semantic thesaurus categories to unknown Chinese words, I assign the semantic category of the nearest morphological neighbor, achieving over 70% accuracy even without contextual information. For the task of disambiguating parts-of-speech, which are particularly ambiguous in Chinese, I apply a variety of new morphological unknown word features, achieving state-of-the-art performance in Mandarin tagging, including improving unknown-word tagging performance on unseen varieties in Chinese Treebank 5.0 from 61% to 80% correct. Finally, for the task of Chinese to English translation, I focus on the difficulties of choosing the correct morphological form for the English translation of Chinese verbs, and explore a set of morpho-syntactic features to be used in MT reranking and decoding.
Keywords/Search Tags:Chinese, Morphological
Related items