Research On Chinese-Lao Bilingual Named Entity Recognition And Alignment Method

Posted on:2019-01-03

Degree:Master

Type:Thesis

Country:China

Candidate:R Han

Full Text:PDF

GTID:2438330563957653

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In the Lao text,there are a large number of proper nouns such as names of people,place names,organization names,etc.These named entities contain a large amount of information.Understanding the main contents of the articles through named entities is the basis for correct understanding of Lao language.Compared with languages such as English and Japanese,the number of people using Lao is less,and Lao’s domestic Internet technology started relatively late,resulting in an extremely lack of corpus resources.This also adds great difficulty to crosslinguistic information processing between Lao and Chinese.For the study of named entities,research in larger languages such as English,Chinese,and Thai has become more in-depth.However,there are few studies on such small-scale languages as Lao language.An in-depth study of Lao’s named entity has an important role in both Lao’s own language analysis and Lao-Chinese translation.In view of the above situation,this paper proposes the following research contents:Firstly,a conditional random-named Lao entity named entity recognition method is studied.The word vector and word vector clustering are used to identify Lao named entities in the feature addition conditional random field,and the word vectors are improved and a weighted word vector is proposed.Through experiments,it is verified that incorporating word vectors as features into conditional random fields can improve the performance of named entity recognition.Secondly,a bilingual named entity alignment method based on multi-feature fusion and support vector machine model is studied.In the study of bilingual named entity alignment,we first identify Lao and Chinese named entities from bilingual corpus and use multiple features to match named entities,including transliteration features,translation features,co-occurrence frequency features,and mutual information features.By adjusting feature weights to achieve the best results.This article uses two methods to filter the named entity equivalence pairs: one is a thresholddefining method,which filters the scores obtained by combining the features of Han and old named entities,sets a threshold,and filters through the threshold and obviously wrong.Name entity pairs,and improve the overall performance of the system;another method is to use support vector machines as the alignment model for bilingual entities named Han and old.This method is to perform binary classification of candidate named entity pairs.In the selection of features,the four features used by the named entity pairs are extracted.This method can comprehensively consider the distribution of each feature to determine whether it is a correct pair of named entity equivalences,which has high accuracy and can improve the performance of the system.Finally,through the above research content,a Chinese-Lao bilingual named entity dictionary is generated,and a bilingual entity named translate system is designed and implemented.

Keywords/Search Tags:

named entity recognition, named entity alignment, word embedding, CRF, SVM

PDF Full Text Request

Related items

1	Chinese-Slavic Mongolian Named Entity Translation Based On Word Alignment
2	Study On Uyghur Named Entity Recognition And Related Problems
3	Research On Chinese Named Entity Recognition Technology Based On Neural Networks
4	Research And Application Of Named Entity Recognition Based On Bidirectional LSTM
5	Research On Named Entity Recognition For Science And Technology Terms Based On Dependent Entity Word Vector
6	Research And Implementation Of Named Entity Recognition Based On Ancient Literature
7	Research On Named Entity Word Alignment Between Chinese And English
8	The Field Of Music, A Combination Of Rules And Statistical Named Entity Recognition
9	Research On Named Entity Recognition And Disambiguation Based On Network Semantic Resource
10	Design And Implementation Of Named Entity Recognition Algorithm For Financial Field