Font Size: a A A

The Research Of Chinese Organization Name Recognition Based On Cascaded Conditional Random Fields

Posted on:2011-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:X HongFull Text:PDF
GTID:2178330332461518Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese Named Entity Recognition (NER) is the basic task of Chinese information processing and it is also important in the filed of machine translation, information retrieval and question answering. As a branch of Chinese Named Entity Recognition, Chinese Organization Name Recognition has an important significance to Chinese auto segmentation and syntactic analysis. Chinese organization name recognition is the most difficult part in NER for the great proportion of organization name in Name Entity.The current methods for Chinese Organization Name Recognition are usually statistic-based, rule-based and combination of the two methods. A method based on CRFs model obtains a good result. Based on the previous work, This paper uses CRFs model to research the recognition of Chinese Organization Name.The core work can be summarized as following:(1) A brief introduction on the basic principle of CRFs and SVM is given. CRFs is a good undirected graph conditional probability model, mainly used to label and segment sequent data. CRFs model can get the global optimal label result using context features. SVM model is a excellent two-class classifier and has a better process ability in high-demision.(2) Through analyzing the label result obtained from recognition system based on monolayer CRFs, we find that some incorrect labels can be modified by identifying the feature word of organization name correctly. Combining CRFs and SVM in monolayer CRFs can improve identification accuracy of the right boundary of organization name. Consequently, the whole result is optimized.(3) Because of the Multi-category feature words, complicated geographic name disturbs the identification of organization name. Complicated geographic name nested in organization name can provide beneficial information. Thus, a cascaded CRFs model is proposed.(4) After comprehensive compare between monolayer CRFs and cascaded CRFs, some factors are found. Most of the incorrect labels usually have lower marginal probability. Modify tokens with lower marginal probability through building potential organization name. The result shows that the method based on CRFs marginal probability is effective.The major contribution is that we solve the multi-category problem of complex geographic name and organization name to a certain extent using the recognition result of complex geographic name produced in first level in stacked CRFs model. Simultaneously, high level gets beneficial input information. Making use of the information of marginal probabilities to improve the result of organization name recognition, and the experimental results prove that cascaded CRFs-based method is effective for Chinese organization name recognition.
Keywords/Search Tags:Natural Language Processing, Chinese organization name, cascaded model, Conditional Random Field, Latent organization name
PDF Full Text Request
Related items