Automatic Extraction Of Chinese-English Named Entity Pairs Based On Bilingual Aligned Corpus

Posted on:2022-09-06

Degree:Master

Type:Thesis

Country:China

Candidate:C Heng

Full Text:PDF

GTID:2518306776463884

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

Bilingual named entity pairs are an important resource in the field of cross-lingual natural language processing.Large-scale bilingual named entity recognition can effectively improve the performance of natural language processing tasks such as information retrieval,machine translation,and automatic summarization,etc.Therefore,research on extraction bilingual named entity is of great significance and application value.This paper describes a extracting named entity pairs system on Chinese-English bilingual aligned corpus.On the basis of studying previous work,aiming at the problems of incomplete matching and mismatching of Chinese and English named entity pairs,a method for extracting named entity pairs with multi-feature fusion of word embeddings is proposed.First,The model recognizes Chinese named entities and English named entities separately,a named entity candidate pairs set is constructed.Then an expansion strategy is adopted to expand the candidate pairs set.For each pair in the candidate set,the model calculate it's features,including transliteration feature,translation feature,length feature,as well as the Chinese-English matching feature and English-Chinese matching feature calculated by expansion model.There are five features totally.Monolingual word embeddings and cross-lingual word embeddings are added for optimization when calculating matching features.Finally,the maximum entropy model and neural network classification model are used to fuse the above-mentioned features to determine whether the named entity candidate pairs have a mutual translation relationship,and finally a named entity pair set is obtained.Experiment results show that the method proposed in this paper can effectively extract named entity pairs on Chinese-English bilingual aligned corpus.The F-values of the entity pairs of person,location,and organization names extracted from the public Chinese-English parallel corpus are 89.50%,86.99%,and 81.22%,respectively.Based on the above methods,this paper designs and implements a named entity pair extraction system,which is divided into four modules:(1)Chinese and English monolingual named entity recognition module;(2)Named entity extension module;(3)Candidate named entity pair features calculation module;(4)Named entity pair aligning module.Based on the above system,this paper uses the Flask Web framework to build an online platform for the named entity pair extraction system.The platform provides two identities:administrator and normal user.The administrator adds and assigns corpus,and the normal user extracts and modifies the assigned corpus by invoking the model online,and saves new bilingual entity pairs in the bilingual entity pair dictionary.

Keywords/Search Tags:

Named Entity Equivalence Pair, Transliteration Model, Matching Model, Word Vector, Online platform

PDF Full Text Request

Related items

1	A Study On The Method Of Obtaining Equivalence Of Chinese And Cambodian Naming Entities
2	Research On Named Entity Recognition For Science And Technology Terms Based On Dependent Entity Word Vector
3	Research On Named Entity Equivalents Automatic Acquisition Method Based On English-Chinese Parallel Corpus
4	Research On Biomedical Named Entity Recognition Based On Hybrid Model
5	Mining Chinese-English Named Entity Pairs From Comparable Corpora
6	Study On Chinese Named Entity Recognition
7	Research On Word-vector-representation-based New Word Discovery And Name Entity Recognition
8	Research On Chinese Named Entity Recognition And New Word Detection
9	Semi-supervised Based Mobile Phone Named Entity Recognition
10	Research On Chinese Named Entity Recognition Based On XLNet And Word Segmentation Fusion Coding