Font Size: a A A

Research Of Chinese Word Segmentation In BERSE

Posted on:2007-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:L D HuangFull Text:PDF
GTID:2178360185977004Subject:Education Technology
Abstract/Summary:PDF Full Text Request
Automatic word segmentation is the basis of NLP. Every Chinese information processing System based of Chinese word must depend on word segmentation. The focus and difficult of Chinese word segmentation are the processing method to crossing- ambiguity word and unknown-word .On the base of BERSE (Basic Education Resource Research Engine) projection, I discussed the method of Chinese word segmentation in this paper.To resolve crossing- ambiguity word problem, 1 found that the distinguishing feature for high frequency characteristic word and presented a method to deal with crossing- ambiguity word, As a support , we used character bi-gram for ambiguity resolution in Chinese word segmentation in the system implement.To identify Chinese name, I adopted a resolution of statistic and regulation, used statistical measures, analyzed the character of name and presented a method for name sort. In system design , it can find Chinese name by the word segmentation fragment, experiments show that about 90% correct rate is achieved.In this paper, I tried to find high frequency word as a resolution to unknown-words. Word segmentation system can count time of successful match and paper for high frequency, and then use some weight to append these unknown-words into vocabulary. Therefore, the capacity of main vocabulary can be extended automatically.In the end of this paper, the frame, flow and interface design of Chinese word segmentation system have been expatiated.
Keywords/Search Tags:Chinese Information Process, Chinese word segmentation, crossing ambiguity, un-known word
PDF Full Text Request
Related items