Font Size: a A A

The Research Of Chinese Organization Name Recognition

Posted on:2009-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:R WanFull Text:PDF
GTID:2178360272970381Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese organization name recognition belongs to the domain of Chinese Named Entity Recognition, which is a basic research in Chinese Natural Language Processing. And Chinese organization name recognition is the most difficult part of Named Entity recognition.Chinese organization name is divided into simple organization name and complicated organization name, and they have different structure. Simple organization name is composed only by one word, while complicated organization name is composed by more than one word. Different methods are proposed to recognize different types of complicated organization names.A cascaded model of Chinese organization name recognition is proposed. The simple organization name is recognized in the first level with CRF, and the results are used to support the recognition of complicated organization in the second level, and then combine the results. Two methods are proposed to recognize complicated organization name:First, a method combines SVM and CRF is proposed. As for the words appeared in the characteristic dictionary, we use SVM to decide whether it is the right boundary of a complicated organization name. If it is, we use CRF to tag from it until encounter non-organization name composition.Then, a method combines CRF and credibility is proposed. Characteristic credibility model and former word credibility model are built to compute the credibility of every word, and then combine the credibility information into CRF to tag.At the end of this paper, we analyze the abbreviation organization name and multi-type organization name, and built a rule model to recognize them simply.The results show our methods are effective, the precision, recall rate and F-measure reach 94.83%, 95.02% and 94.93% respectively when testing on PKU corpus, while the precision, recall rate and F-measure reach 93.24%, 82.39% and 87.48% respectively when testing on msra corpus.
Keywords/Search Tags:Natural Language Processing, Chinese organization name, cascaded model, Support Vector Machine, Conditional Random Fields
PDF Full Text Request
Related items