A Study On Acquiring Chinese-English Named Entity Translation Equivalents Based On Comparable Corpus

Posted on:2024-06-14

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhang

Full Text:PDF

GTID:2555307148470734

Subject:Foreign Linguistics and Applied Linguistics

Abstract/Summary:

PDF Full Text Request

Named entities refer to entities identified by their names,such as people,places,and institutions.Bilingual named entity translation equivalence pairs,which are named entity pairs with translation relationships between two languages,are important resources in natural language processing.Comparable corpora,while not as precise in language alignment as parallel corpora,are rich in resources and easy to obtain.Bilingual named entity pairs can be extracted from comparable corpora.This paper presents a study on constructing a Chinese-English comparable corpus and extracting Chinese-English named entity pairs.Building upon previous literature,we propose a method for constructing a Chinese-English comparable corpus based on keyword similarity.This approach involves normalizing the bilingual text using machine translation,extracting keywords from the text,and determining the comparability of the bilingual texts based on the similarity of their keywords.Experiments show that our keyword-based text similarity calculation method has certain advantages over the traditional dictionary-based method,achieving 73.67%,90.26%,and 81.12% in accuracy,recall,and F-score,respectively.We also propose a method based on multi-feature fusion for extracting bilingual named entity pairs from comparable corpora.This method combines the unique characteristics of Chinese and English,incorporating four features: transliteration information,translation information,word length information,and co-occurrence frequency information of named entities.We design a multi-feature model based on the maximum entropy model,and optimize it to achieve 84.90%,82.57%,and 83.72%in accuracy,recall,and F-score,respectively.Our optimized model demonstrates a22.10% improvement in F-score compared to the default weight model.

Keywords/Search Tags:

Named Entity Pairs, Comparable Corpora, Text Similarity, Multi-Feature Fusion

PDF Full Text Request

Related items

1	Research And Application Of Korean Named Entity Recognition Method Based On Multi-Granularity Fusion
2	Research On Reassembly Of Cultural Fragments Based On Multi-Feature Fusion
3	The Text Of The Same Event In The Chinese-english Bilingual Web Resource Extraction,
4	A Named Entity Recognition Method For Text Of Han Dynasty Paintings
5	Recognition Of Uyghur Musical Named Entity Based On CRF
6	Named Entity Recognition For The Field Of Ancient Chinese
7	Research On Chinese Named Entity Recognition Based On Annotation Schemes And Character-word Fusion
8	Research On Chinese-Vietnamese Entity Alignment Technology Based On Named Entity Recognition
9	A Selective Translation Report Of Using Comparable Corpora For Underresourced Areas Of Machine Translation(Chapter 2)
10	Research On Classification Of Various Classics Of Sinology Citation Series By Integrating Entity Feature Knowledge