Font Size: a A A

Laos Named Entity Recognition Research

Posted on:2018-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:S P DuanFull Text:PDF
GTID:2358330518460483Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Named entity recognition(NER)has been an important basic work in the field of Natural Language Processing since it was named.In Laos,the study of named entities is rather weak,the information processing of the Lao language is also important in the economic and cultural exchanges between the two countries,so in order to better adapt to the development of the bilateral economic,political and other aspects of the study,named entity recognition of Lao is necessary and indispensable.This paper mainly studies the methods of identifying the place names,names and organizational structures.The main research results are as follows:(1)The recognition of Laos organization name based on divergenceAccording to the characteristics of Lao Lao language named entity recognition,the main problem is that the Lao language named entity corpus is scarce,and gets slower,the research at home and abroad are very few,only rely on online resources,manual annotation and expert teachers,students from Laos,corpus for study is not enough.In view of this situation,this paper presents an algorithm of the recognition of Laos organization name based on divergence.At first,3 supervised classifiers are trained by the named entity corpora,This is a conditional random field CRF for training,and then through the three classifiers were trained the same unlabeled corpus,in this process,we mainly use the classification of weighted voting strategy for preliminary marking on the unlabeled samples.Secondly,we make a second validation of the initial labeled corpus,and finally add the new samples to our existing data sets.(2)The recognition of Laos organization name based on a cascaded conditional random fieldsThrough the above experiments we extend the experimental part of the corpus,in the research laboratory before,through the monolayer of CRFs and method combining rules and statistics based on the identification of Lao names.Good results have been obtained in experiments with small scale corpus.But aiming at organization name entity recognition,there is no specialized research,and because the organization name in Laos,the noun contains many nested,only by single model is difficult to identify,therefore,this paper presents an algorithm of the recognition of Laos organization name based on a cascaded conditional random fields.This algorithm is mainly based on two layer CRFs to identify the mechanism of Laos,in the first layer,we mainly through Laos names,recognition of simple names,organization names and Lao Lao,and combined with the observed values to transfer the results to the second layer of the CRFs model.In the second level conditional random field model,we combine the results of the first step to develop the corresponding characteristics of Laos template,to achieve the identification of the complex organization name of laos.The experimental results show that there is a good effect on the recognition of the name.(3)Laos organization name using cascaded model based on SVM and CRFIn depth analysis of the Lao institutions some structure characteristics,we found in the feature names in Laos,Laos are most institutions will have a boundary feature,if we then identify mechanism through the boundary feature of Laos Laos for recognition of organization name recognition rate should be increased.The proposed method based on the cascaded conditional random fields is not a good solution to this problem.Therefore,in this paper,we propose a new method to identify the organization name of laos.A hybrid approach using conditional random fields and support vector machines to identify the organization name of laos.In this method,first of all,in the first layer,we mainly through Laos names,recognition of simple names,organization names and Laos,Laos,and the results combined with the observed value of the results are then passed to the second layer model(support vector machine),On the second level,we use the method of driving to identify the name of the Lao institution by identifying the boundary characteristics of the Lao institution name.The experimental results show that the correct rate of the recognition of the name of the Lao institution name to the name of Laos has been significantly improved.
Keywords/Search Tags:organization name recognition, two-layer model, semi supervised learning, conditional random fields(CRF), divergence, named entity recognition, support vector machine(SVM), cascaded model, laos
PDF Full Text Request
Related items