Font Size: a A A

Defining and automatically identifying words in Chinese

Posted on:2003-06-15Degree:Ph.DType:Thesis
University:University of DelawareCandidate:Xue, NianwenFull Text:PDF
GTID:2465390011482416Subject:Language
Abstract/Summary:
There are two important aspects of Chinese word formation that need to be addressed for in a theory of Chinese morphology. The first aspect is that the formation of complex words is highly regular and word formation is recursive. This seems to indicate that word formation is syntactic in nature. The second aspect of Chinese word formation is that Chinese words demonstrate lexical integrity effects. Components of words cannot be moved out of the word, cannot be deleted, are opaque to external reference and cannot take phrasal modifiers. This state of affairs seems to indicate that words are formed in the lexicon. There is thus a dilemma as to where words are formed in Chinese.; Work in the lexicalist framework either posits different notions of word (Dai 1992) or devises complicated word formation rules in the lexicon to account for this (Packard 2000). I have taken a radically different approach in this dissertation and insist that in Chinese complex words be formed in syntax, in the spirit of the Distributed Morphology Hypothesis (Halle and Marantz 1993; 1994 and others). In Chapter 2, I first examined the wordhood tests that have been proposed in the Chinese linguistics literature and conclude that some of the tests follow from the general X-bar theoretic framework and others follow from locality conditions such as the LIH. I then showed how the LIH effects can be derived in a straightforward manner if words are formed in syntax in Chapter 3. In Chapter 4, I examined complex verbs and showed their formation provides further evidence for our theoretical position. In Chapter 5 I described an automatic word segmenter that implements our theoretical assumptions with the transformation-based error-driven algorithm (Brill 1993). Our working hypothesis is that if our theoretical assumptions are correct, we should see better results over “lexicalist” implementations. The results show that our implementation provides a significant improvement over a lexicalist implementation that uses the maximum matching algorithm in terms of overall accuracy and in dealing with new words. We take this to be a validation of our theoretical assumptions.
Keywords/Search Tags:Word, Chinese, Theoretical assumptions
Related items