Font Size: a A A

Chinese Word Segmentation System Design And Implementation

Posted on:2011-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:X H ZhangFull Text:PDF
GTID:2208360308467016Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
One of the basic parts of Natural Language Processing and Machine learning is Chinese word segmentation. Chinese Language Processing deals with Chinese in three levels: Words,Phrases and Sentences. In Chinese, phrases are minimum language unit and are the base of sentences processing. Only properly deal with the phrase can sentences be processed well. The most important part of phrases level is Chinese word segmentation. In English, retrieval is very convenient because the phrase is separated from others by space. While there is no separator between the two Chinese phrases, so it is need special technology to establish retrieve based on phrases, this technology is called Chinese word segmentation.As the development of Chinese information processing, Chinese word segmentation gets great development and many algorithms are appeared. According the characteristics, the existing algorithms can be divided into four classes: string matching based, comprehensive based, statistic based and semantic based. Every method has its own advantages and disadvantages, only using one method could not achieve satisfactory results. However To combine two or three of the methods, and have complementary advantages, the word segmentation result is a great satisfaction.Based on the previous work, a general system is designed for the Chinese segmentation. This paper presents a model of rough segmentation, which is based on the statistics N-shortest-paths method, to achieve the rough Chinese segmentation. And then this paper use a fast-Chinese personal name recognition as a pre-processing, and deal the output with an approach for Chinese personal name recognition based on role tagging. The experiments show this system has good performance.
Keywords/Search Tags:Chinese Word automatic segmentation, name Identification, Hidden Markov Model
PDF Full Text Request
Related items