Font Size: a A A

Research Of Uyghur Person Names Recognition Based On Statistics And Rules

Posted on:2015-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:R L M M T R Y M JiaFull Text:PDF
GTID:2298330431491895Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As we know, Uyghur person name recognition is the premise and basis of Uyghurinformation processing tasks, in which the person name occupies a large portion, andperson name recognition is the most difficult part of Named Entity recognition.Taking into account the purely statistical method or rule-based method based onautomatic identification Uyghur names there are some deficiencies, but a combinationof statistical methods and rules can complement each other to make up for certaindeficiencies. In this paper, we propose a mixed strategy of combining statistical andrule-based method, we propose a method for automatic recognition of Chinese names.The main research work includes:(1) The names of characters used to build a knowledge base (Uyghur Names), Uighurnames prefixes and suffixes thesaurus (Man-Suffix, Woman-Suffix), Place names,Organization names as well as a prominent figure in the dictionary (Famous-PersonNames) and common names of ambiguity library (Ambiguous Names), using avariety of library statistical information, the text of the Uighur masterpiecespreliminary extraction (referred to extract candidate names).(2) analysis of the characteristics of Chinese names constitute itself has internal, andexternal features include context information, template information, etc., to extractthe typical feature set, and summed up the corresponding identification rules, thecandidates were identified.(3) This paper analyzes the structure and syntax of names Uygur characteristics,summarized the corresponding disambiguation rules, ambiguity names fordisambiguation. This paper uses a rule-based approach to further improve the efficiency of name recognition.(4) System Design and Implementation: Based on the statistics and rules toconstruct a mixed strategy Uighur name recognition system, in the name of thecandidate do feature extraction and extraction rules applied to determine the existenceof the input text Uighur names, and extracted saved to the result file. In this paper, asa test corpus12.59MB experimental data, to build a closed and open systems testing,experimental results show that the accuracy rate reaches88.47%closed test, therecall rate reaches85.1%;...
Keywords/Search Tags:Uyghur person name recognition, Named entity recognition, statistics andrules, Disambiguate
PDF Full Text Request
Related items