Font Size: a A A

Based On The Statistics Of Open Chinese Word Segmentation

Posted on:2003-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:H C GuanFull Text:PDF
GTID:2208360065455536Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The Chinese automatic word segmentation is an important part in the Chinese information processing. The method based on statistics has the problem of training data's rarefaction, and what restricts the more progress of corpus is the too large workload of manual tagging.Refer to Chinese automatic word segmentation based on statistics, this paper imports the mechanism of open learning, and uses the method of supervised and unsupervised learning. The word segmentation model includes credibility revising and partial tri-gram information. Then it discusses several problems such as segmentation algorithm and human-computer interface during system implementing. The arguments and thresholds of the model are determined through the experiments. The test result shows that, with the open learning model, the close segmentation accuracy can reach 99.07% while the open one 98.08%, and there is a good adaptability and disambiguation ability of the system.
Keywords/Search Tags:Natural Language Processing, Chinese Segmentation, Corpus, Grammar Model, Open Learning
PDF Full Text Request
Related items