Font Size: a A A

Automatic Identification Of Chinese Organization Names Based On SVM And Maximum Entropy

Posted on:2007-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:D L YangFull Text:PDF
GTID:2178360212957402Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Chinese organization name recognition belongs to the domain of the recognition of Name Entity, which is a basic research work in Chinese lexical analysis. If there are some unknown Chinese organization names in the text, they will affect the correctness of segmentation and lexical analysis, this requires the segmentation system of having the ability to recognize the Chinese organization name, so it can improve the correctness of segmentation and lexical analysis.The automatic recognition method of Chinese organization name with the combination of SVM and Maximum Entropy is proposed. As for the words appeared in the characteristic dictionary, we use SVM to decide whether it is the characteristic word of the organization name (latter boundary decision) , we use the method based on SVM to tag from the word before the characteristic word, until encounter non-organization name composition (tagging foreside), then continue the process mentioned before in this paper.In order to improve the efficiency of the latter boundary decision, a drive recognition method is proposed, which decides the latter boundary of the words appear in the text, which are collected in the characteristic dictionary, then tag the former parts of the organization name.The latter boundary decision is a problem of two value categorization, and SVM can effectively solve the problem of the recognition of the characteristic word of the organization name.Due to the complex composition of the former word of the organization name, Maximum Entropy combine different kinds of text information, and solve the problem of the recognition of the more complex former words of the Chinese organization name. According to the feature of the former words and the analysis of the statistical results, we make the Maximum Entropy feature module, establish the feature set and access the parameters, eventually get the former parts tag module based on Maximum Entropy.The results show that SVM and Maximum Entropy combined Chinese organization name recognition is effective: in open test, the recall and precision rate and F-measure are 91.05%, 93.59%, and 92.84% respectively. Compared to present document of this kind, the recognition system gets better results , furthermore, it can also recognize other name entities, such as person name, place name and so on.
Keywords/Search Tags:Chinese Organization Name, Drive mode, Maximum Entropy, Support Vector Machine (SVM)
PDF Full Text Request
Related items