Font Size: a A A

Automatic Recognition Of Mongolian Names Based On CRF

Posted on:2017-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:J J CaiFull Text:PDF
GTID:2348330485461592Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Mongolian named entity recognition is a basic subject in the Mongolian natural language processing and it is widely used in the field of information extraction, machine translation, text clustering, information retrival and so on. While the Mongolian names recognition is the largest proportion in the Mongolian named entity recognition, and it is also difficult to identify. Therefore, the research of the Mongolian names recognition is very important to research of Mongolian information processing.To recognize the person names from the Mongolian text correctly, this paper combine the characteristics of the Mongolian names and uses Conditional Random Field Model to implement the system of Mongolian names recognition. The system of automatic recognition of Mongolian names based on CRF is consists of four parts, corpus preprocessing, CRF model training, Mongolian names predict based on CRF and system evaluation. The design of CRF feature template and feature selection are the main factors of Mongolian names recognition based on CRF.The paper uses six kinds of CRF feature templates to analyse the features which are beneficial to promote performace of Mongolian names recognition. Through testing the feature templates, we find that introducing context information and feature combination information can improve the performance of Mongolian names recognition. And through the comparison experiment of 6 feature templates, we find that when the feature template is too complicated, the performance of Mongolian names recognition will be bad.In addition, in order to further enhance the performance of Mongolian names recognition, this paper extracts seven kinds of characteristics, Latin feature, intermediate code feature, place name feature, boundary feature, pinyin feature, verb feature and case feature. While boundary feature consists of appellation feature, position feature and vocation feature. Then through the comparison experiment of features we found that each feature will improve the performance of Mongolian names recognition. Finally, we incorporate seven features into a template which has the best performance, and the F value of Mongolian names recognition reaches to 92.64%, the result is about 2% higher than Maximum Entropy Markov Model.
Keywords/Search Tags:Mongolian names recognition, Conditional Random Field, feature template, feature selection
PDF Full Text Request
Related items