Font Size: a A A

Chinese Names Of Automatic Identification Technology Research

Posted on:2006-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y J HuFull Text:PDF
GTID:2208360152491668Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Automatic recognition of Chinese name is one of the most difficult problems in word segmentation, and because of this bottleneck, the precision of word segmentation is not high. By presenting an approach for Chinese name recognition based on HMM, this paper breaks through the traditional recognition method based on the rules and statistics.Firstly, this paper analyses the structure of Chinese names and their complex appearance in texts. According to different functions in the generation of Chinese names, this paper presents the conception of role and pattern collection. Role tagging for tokens after segmentation can be described with HMM. So we can look for the optimum role sequence with Viterbi algorithm. The possible names are recognized after maximum pattern matching on the role sequence. During the recognition process, only the possibilities of tokens being specific roles and the transition possibilities between roles are required. The significance of this method is that such lexical knowledge can be totally extracted from corpus automatically.In both close and open tests on the large scale realistic corpus, its recalling rate is above 90% and its precision is satisfactory. Various experiments show that: HMM-based algorithm proposed in this paper is effective for Chinese name recognition.
Keywords/Search Tags:word segmentation, Chinese name recognition, HMM, role
PDF Full Text Request
Related items