Research On Cambodian Named Entity Recognition Using Cross-Language Features

Posted on:2019-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Guo

Full Text:PDF

GTID:2438330563457605

Subject:Instrumentation engineering

Abstract/Summary:

PDF Full Text Request

Bilingual word alignment is an important task in natural language processing technology.Its purpose is to find vocabulary-level mappings in two language texts translated at the sentence level.Word alignment is the basis of many natural language processing tasks.Named entity recognition has always been a hot and difficult issue in the field of natural language processing,and it is also an important basis for statistical machine translation and cross-language information retrieval.The Khmer natural language processing technology started late and is limited by the scarcity of corpus resources.This paper uses the mature technology of English entity recognition to help Khmer named entity recognition.Regarding the issue above,Based on research and analysis of existing research work,This article mainly completes the following characteristic research work:1.Word alignment method based on non-parametric Bayesian modelThe principle of word alignment based on non-parametric Bayesian model is to use the PY process(Pitman-Yor processes)to replace the classification distribution of IBM model 4 to construct a non-parametric Bayesian model that combines language features,and proposes a bilingual word alignment method.The IBM word alignment model is the main model applied to most statistical machine translation systems.The problem with the model is that bilingual language variability is not taken into account and overfitting problems tend to occur during training.It is not suitable to solve the problem of natural language processing in Khmer language where the corpus is scarce.To avoid this problem,In this paper,the non-parametric Bayesian model is used and the language features of the Khmer attributive postposition are added to achieve alignment between English and Khmer words.This method is better than the IBM model in terms of word alignment and has achieved good results.2.Integrating Cross-language Features of Khmer Named Entity Recognition MethodThe method of naming and identifying Khmer entity entities that incorporate cross-language features is used to solve the problem of the lack of effective entity features in named entity entities in Khmer.This will increase the correct recognition rate of Khmer named entities.Considering that the research methods of named entities in the English field are relatively mature,we use the more mature named entity recognition technology in the Englishfield,and use British-Khmer parallel corpus as a bridge to transfer knowledge into Khmer language to realize the recognition of the named entity of Khmer.First of all,Refers to existing mature named entity recognition technology in English.According to the word alignment relationship,the English entity tag is mapped to the aligned Khmer side in a certain way.Through the tag propagation algorithm between the Khmer languages,the distribution of entity tags of all Khmer words is obtained.By setting a threshold,the entity tags are distributed in a Boolean representation,and the results are used as features in the conditional random field model.Names,place names,and organization names are identified.3.Constructed a prototype system for identifying Khmer named entity that integrates cross-language featuresBased on the research results,a prototype system of Cambodian named entity recognition wi th cross-language features was designed and developed.The tools and system framework require d for system construction are introduced,and the process of using the system is described in detail.It achieves the recognition of names,place names,and organization names in the Khmer language document.

Keywords/Search Tags:

Word Alignmen, Pitman-Yor processes, Named Entity Recognition, Cross-language Features, Conditional Random Fieldst

PDF Full Text Request

Related items

1	Research Of Named Entity Recognition Based On Conditional Random Fields
2	Study On The Tibetan Word Segmentation And Named Entity Recognition With Conditional Random Fields
3	Recognition Of Named Entity In Electronic Medical Records Based On Cascaded Conditional Random Fields
4	A Study On Chinese Named Entity Recognition
5	Chinese Named Entity Recognition Based On Neural Network And Language Model
6	Named Entity Recognition Based On Conditional Random Fields Chinese Research
7	Research On Chinese Named Entity Recognition Model Based On Deep Learning
8	Research On Chinese Named Entity Recognition Technology Based On Neural Networks
9	Study On Named Entity Recognition For Chinese Specific Domains Based On Deep Learning
10	Research On The Key Technology Of Named Entity Recognition And Relation Extraction In Military Field