Font Size: a A A

Research On Recognition Of Kazak Name Entity Based On Multi Feature Fusion

Posted on:1970-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:L N L B K WuFull Text:PDF
GTID:2348330533956496Subject:Workers learn
Abstract/Summary:PDF Full Text Request
Named entity recognition is the recognition of the names of the input objects in the text object,Organization name entity and Place name entity.Named entity recognition is widely used in the fields of Natural language processing,information extraction,information retrieval and Machine Translation.As the name of the person becomes an important object of recognition,the recognition of the person name is the most challenging task in the name recognition.Compared with other named entity recognition,Kazak names recognition is still in a preliminary state.The research of this paper is as follows:This paper studies and tests the Kazakh name recognition based on statistical model.As the Kazakh language belongs to the Turkic language group in the Aletai family,Therefore,the composition of the text has the characteristics of adhesion.Through a great deal of reading and Kazakh grammar and word formation characteristics of the information obtained from the Kazakh named entity knowledge,We start from the Kazakh adhesion characteristics,They has name recognition feature string suffix syllable,and other characteristics of the Kazakh word separation and analysis,The more effective recognition information is obtained with the smallest linguistic feature unit,then split the names,string length features added to the conditional random fields.The window size of the model is determined by the contrast experiment,The name recognition model is established and the recognition effect is good and To make up for the lack of Kazakh name recognition.The accuracy,recall and F value of the method respectively were 92.31%,91.56% and 91.93%.Because of the similarities between the Uighur names and the kazakh names,Therefore,the Uyghur word is marked by the smallest linguistic feature unit and The use of the same feature template to identify Uighur names.The experimental results show that the proposed method can achieve the accuracy and the recall rate and the F value of the Uyghur name recognition,which is respectively 91.92%,90.42% and 91.16%.Through the identification of Kazak names,The necessity to realize the lemmatization.This paper analysis and study Kazak stems and affixes structure rules and N-gram language model by statistical stemming rules to Kazakh stems were extracted,The experimental results show that the accuracy rate of Kazakh stemming is 78.34%.
Keywords/Search Tags:Kazakh, Uyghur, Name recognition, Conditional random fields, stem segmentation
PDF Full Text Request
Related items