Font Size: a A A

The Research And Implemenation Of The Chinese Word Segmentation System Combining Omini-segmentation With Statistic

Posted on:2010-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2198330332488605Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The Chinese word segmentation, a main part of the Chinese information processing, is the basis of text mining, machine translation and information retrieval. Because of the complexity of the Chinese word segmentation problem, the algorithm of Chinese word segmentation is becoming one of the most important research contents in Chinese information processing.This paper first analyses the basic method and the Chinese word segmentation problem. Second, the corpus-based statistic method and key technology is studied, a improving omni-segmentation algorithm combining corpus-based statistic method is presented on that basis. The algorithm is that, a directed acyclic graph based on statistical dictionary is constructed by the improved omni-segmentation algorithm, and then a result set is obtained by statistic-based screening algorithm, finally, the best segmentation path is obtained after unknown word recognition. A Chinese word segmentation system based on this algorithm is designed and implemented.The experimental results show that, the algorithm which combining the improving omni-segmentation algorithm with the corpus-based statistic method, improves the ability of processing problems, and is of certain feasibility and the serviceability.
Keywords/Search Tags:Chinese Word Segmentation, Omni-Segmentation, Statistical Word Segmentation, Statistic-based Dictionary
PDF Full Text Request
Related items