Font Size: a A A

The Effect Of Part Of Speech On Chinese Word Segmentation

Posted on:2011-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:J N LiuFull Text:PDF
GTID:2178330332460995Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Word segmentation is a basic part of natural language processing (NLP).As we know, many research topics in the field of NLP are built based on word segmentation. In this way, the precision of word segmentation has a great influence on post-processing.Chinese word segmentation is always an important and difficult problem in the Chinese lexical analysis due to its complexity and variety.With the improvement of international standing of China, more and more researchers in the NLP field work based on Chinese, and propose a further endeavor on Chinese word segmentation.Firstly, we give an overview on Chinese word segmentation through analyzing the present state and development tendency of it. Then, we introduce the composition of part of speech (POS)system and POS tagging which includes the following aspects:the approaches of POS tagging; the segmentation and tagging criterion;the form of POS tagging collections. Next, the Chinese segmentation models are described in detail.The important of this part is the directed graph like Hidden Markov Model, Maximum Entropy Hidden Markov Model and Conditional Random Fields Model. Finally, we compare the results using different word segmentation models through a great deal of experimentation on Chinese word segmentation. On these bases, the effect of POS tagging system is discussed. What's more, character-based and word-based segmentation methods are compared and analyzed.In this paper, the conclusions based on experiments mainly include:(1)Compare the results of POS-unemployed and POS-employed system on the same corpora. Experimental results show that the results having POS outperforms the non-POS system. (2) Compare the results of different POS tagging system using POS-employed word segmentation tool.We found that the segmentation is more effective with the smaller POS grading. (3)Compare character-based and word-based segmentation with a CRFs model. Character-based method is better on OOV segmentation, while word-based method outperforms on IV segmentation.
Keywords/Search Tags:Chinese Word Segmentation, Hidden Markov Model, Conditional Random Fields, part of speech (POS)
PDF Full Text Request
Related items