Font Size: a A A

Research On Uyghur Person Name Recognition Based On Conditional Random Fields

Posted on:2014-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:H T E A E K MuFull Text:PDF
GTID:2248330398467714Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Recently, Named Entity Recognition (NER) is becoming increasingly importantin natural language processing as a sub-field of information extraction. Named entity(NE) is one of the major elements of information in text processing, is the mainfactors for understanding the texts clearly and correctly. Uyghur Named EntityRecognition technology, on the basis of a correct understanding of the Uyghur text,the common named entities in the text-names, place names, organization name, time,date, and other entities are identified and classified according to their type. Namedentity recognition is one of the essential basic work in natural language processing,and also the key technology in many technical applications, such as: informationextraction, text conversion, information retrieval, machine translation and otheraspects of a very wide range of applications value. So, Uyghur named entityrecognition technology research in the field of natural language processing has greattheoretical significance and practical value.In this paper, first, review the named entity recognition technology at home andabroad to study the status quo and to explore the various methods used in the namedentity recognition technology, the results and named entity recognition evaluationcriteria. Detailed description of the Uyghur named recognition method and rule-basedplace name identification method based on Conditional Random Fields (CRF). In thisthesis, the work done for the following:1) Detailing the conditional random field model and the characteristics of thismodel than any other machine learning model. Conditional random field model isexcellent conditional probability model; it overcomes the assumption of independenceof the generated model, while avoiding the mark to model the paranoid, and has theadvantages of both models.2) Studied and implemented a new method based on Conditional RandomUyghur name recognition. First, start with the analysis of the adhesive characteristicsof the Uyghur, the study concluded Uyghur names compositional characteristics;complete conditional random field model to establish and Corpus building Uyghurtext design; word form, part of speech, stem, suffix, syllable, last syllable and a verb,the dictionary is characterized based on conditional random the Uyghur namerecognition method and greedy algorithm to achieve the best feature templateselection.3) This paper further explore the Uyghur place names internal structure,rule-based place name recognition and use of Visual C++programming tools toachieve recognition algorithm, also achieved initial recognition efficiency.The results of this study can also be used Uygur names and organization namesand other named entity recognition experiments show that the proposed method iseffective.
Keywords/Search Tags:CRF, Named Entity Recognition, Natural Language Processing, Algorithm, characteristics of the template
PDF Full Text Request
Related items