Font Size: a A A

Automatic Extraction Of Chinese-English Named Entity Pairs Based On Bilingual Aligned Corpus

Posted on:2022-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:C HengFull Text:PDF
GTID:2518306776463884Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Bilingual named entity pairs are an important resource in the field of cross-lingual natural language processing.Large-scale bilingual named entity recognition can effectively improve the performance of natural language processing tasks such as information retrieval,machine translation,and automatic summarization,etc.Therefore,research on extraction bilingual named entity is of great significance and application value.This paper describes a extracting named entity pairs system on Chinese-English bilingual aligned corpus.On the basis of studying previous work,aiming at the problems of incomplete matching and mismatching of Chinese and English named entity pairs,a method for extracting named entity pairs with multi-feature fusion of word embeddings is proposed.First,The model recognizes Chinese named entities and English named entities separately,a named entity candidate pairs set is constructed.Then an expansion strategy is adopted to expand the candidate pairs set.For each pair in the candidate set,the model calculate it's features,including transliteration feature,translation feature,length feature,as well as the Chinese-English matching feature and English-Chinese matching feature calculated by expansion model.There are five features totally.Monolingual word embeddings and cross-lingual word embeddings are added for optimization when calculating matching features.Finally,the maximum entropy model and neural network classification model are used to fuse the above-mentioned features to determine whether the named entity candidate pairs have a mutual translation relationship,and finally a named entity pair set is obtained.Experiment results show that the method proposed in this paper can effectively extract named entity pairs on Chinese-English bilingual aligned corpus.The F-values of the entity pairs of person,location,and organization names extracted from the public Chinese-English parallel corpus are 89.50%,86.99%,and 81.22%,respectively.Based on the above methods,this paper designs and implements a named entity pair extraction system,which is divided into four modules:(1)Chinese and English monolingual named entity recognition module;(2)Named entity extension module;(3)Candidate named entity pair features calculation module;(4)Named entity pair aligning module.Based on the above system,this paper uses the Flask Web framework to build an online platform for the named entity pair extraction system.The platform provides two identities:administrator and normal user.The administrator adds and assigns corpus,and the normal user extracts and modifies the assigned corpus by invoking the model online,and saves new bilingual entity pairs in the bilingual entity pair dictionary.
Keywords/Search Tags:Named Entity Equivalence Pair, Transliteration Model, Matching Model, Word Vector, Online platform
PDF Full Text Request
Related items