Font Size: a A A

Design And Implementation Of Chinese Word Segmentation System Of Self-Adaptive Ambiguous Segmentation

Posted on:2006-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:T WenFull Text:PDF
GTID:2178360155967459Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Chinese automatic word segmentation is the fundamental task of the Chinese Information Processing. It becomes one of the bottlenecks in Chinese Information Processing. The Chinese word dis-ambiguity is the key factor of precision of Chinese word segmentation. Many researchers have studied in this field. But come to the present condition, it can't satisfy the demand of application.This article has done deeply research in two aspects of Chinese word segmentation: segmentation speed and dis-ambiguity. At the speed aspect, through arraying the word in the dictionary and getting the index of the first word, and men using the regularity of the array of word, it raises the speed of word-seeking. At the same time, the article improves the model of Chinese words rough segmentation based on N-shortest-paths method. By deleting the result of word segmentation which contains the uncoverd ambiguity, it reduces the rough results. Under the condition of taking no account of the undefined word, the recalling rate can be 100%. At last, through the analysis of the existing algorithms of Chinese word segmentation, the article points out that the current algorithm's flaw is information's incompleteness on Chinese corpus, and then it introduces an algorithm that based on the information of multigram. It can collect the ambiguous sentences in segmenting error. Then through manual interference the system can adjust the multigram information automatically. It can strengthen the maturity of the information of multigram and enhance the accuracy of Chmese word segmentation.At last, the article has made an all-around comparison between the system and ICTCLAS's (Institute of Computing Technology, Chinese Lexical Analysis System) on speed and accuracy. In the end it discusses the advantage and shortage of this system and gives some prospects for the future.
Keywords/Search Tags:self-adaptive, word segmentation, ambiguity, multigram
PDF Full Text Request
Related items