Font Size: a A A

The Research Of Uighur Organization Name Recognition

Posted on:2014-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:R G R Z MiFull Text:PDF
GTID:2248330398467941Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Named Entity Recogniton(NER) is the premise and basic of Uighur informationprocessing tasks, in which the organiaztion name occupies a large proportion, andorganization name recogniton is the most difficult part of Named Entity recognition.Uighur organization name has its special syntactic and semantic characteristics,so therecognition of Uyghur organization name is not simple transplantation of widely usedtechnology of the recognition of English and Chinese organization name.This article is a preliminary study of the Uighur organization name recognitionmethods, by analyzing the structural characteristics of the Uighur organization name,respectivly used based on syntactic and semantic knowledge Uighur organizationsbased on the syntax and semantics and based on Conditional Random fields methods.Finally summarizes the advantages and disadvantages of these two methods.Firstly, according to the syntactic and semantic characteristics of Uyghurorganization name, summed up the rule of constructionof simple organization nameand complicated organization name, then designs effective recognition rules,corresponding knowledge bases, and efficient recognition algorithm based on statetransition and key-word matching. We select representative examples from theTianshan net news to build the test set for organization name recognition,experimental results show that, our system achieves high accuracy with fastprocessing, with the F Measure of86.06%.Secondly, organization name recognition transformed into a sequence annotationproblem, then use statistical method tosolve organization name recognition.Sinceconditional random fields model more complex features can be used for training andinterferencing, it will not only be able to take full advantage of the contextinformation as features, can also add any other external features, is currently the best sequence annotation model one. So this paper chooses conditional random fields fororganization name recognition. This paper uses the word features, POS features andalso uses mentioned above three knowledge bases wich are special word,former wordand location word as a feature for organization name recognition, and experimentalverification of the validity of these features. Experiments show that, with CRFidentify organization name is effective. According to the experiment on the corpus ofXinjiang Uyghur Autonomous Region Radio Station in open test, with the F Measureof83.92%.
Keywords/Search Tags:Named entity recognition, Uighur Organization name recognition, Knowledge base, Rule matching, Conditional Random Fields, Feature, Feature-template
PDF Full Text Request
Related items