Font Size: a A A

Research Of Automatic Chinese Segmentation And Name Recognition

Posted on:2012-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:C Z JiangFull Text:PDF
GTID:2178330335461604Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous development of national information technology and the popularization of Internet, natural language understanding becomes a hot research field. As the first step in natural language understanding, automatic Chinese segmentation is more sophisticated and it determines the follow-up processes.Chinese name is the most important component of unknown words, its existence is one of the most important factors which are influence the segmentation accuracy. Therefore, Chinese name recognition is a key technology in Chinese automatic segmentation. Currently, it is still unsatisfactory in processing on the result, its recognition quality still need to be enhanced in the further.This thesis will research Chinese automatic segmentation model and Chinese name identify problems, mainly work focused on:(1)A new dictionary mechanism named Dynamic four-character bidirectional dictionary mechanism is proposed. In this dictionary mechanism, we can reduce the mean frequency of visiting dictionary effectively.(2) In order to boost the precision of Chinese name recognition, we construct a Chinese name recognition model combining HowNet with Bayesian classifier. The basic idea is to locate and recognize the Chinese name roughly by Bayesian classifier, and then to fix this name by using HowNet. The model not only has the advantages of simple formula and ability to learn, but also overcomes the extensive use of rules and the difficulty of boundary defining.
Keywords/Search Tags:natural language understanding, Automatic Chinese segmentation, four-character dictionary, Chinese name recognition, Bayesian classifier
PDF Full Text Request
Related items