Research On Mongolian-Chinese Neural Machine Translation Based On Entity Generalization Strategy

Posted on:2024-07-30

Degree:Master

Type:Thesis

Country:China

Candidate:M L Chen

Full Text:PDF

GTID:2568307142966269

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Mongolian named entity recognition is one of the basic tasks of Mongolian information processing,which plays a key role in downstream tasks such as Mongolian-Chinese machine translation.In recent years,with the continuous development of deep learning,this method has been used in many fields of natural language processing for continuous research and exploration,and has achieved certain results in research tasks such as machine translation and named entity recognition in various languages.This thesis first uses the method of label migration to construct a Mongolian named entity annotation corpus,and then uses the neural network method to study the Mongolian named entity recognition model based on the constructed annotation corpus.Finally,the entity generalization strategy is used to apply the Mongolian named entity recognition model to Mongolian-Chinese neural machine translation.The main content of this thesis is:Firstly,aiming at the scarcity of Mongolian entity annotation corpus and the dependence of neural network-based methods on data,a Mongolian named entity annotation corpus construction method based on label transfer is proposed.The traditional methods of manually constructing annotated corpus is too time-consuming and energy-consuming,and the label migration method can migrate the named entity labels of multi-resource domains to low-resource domains through the migration idea.The Mongolian entity annotation corpus construction method proposed in this thesis combines the transfer thought with the Chinese-Mongolian machine translation to obtain the Chinese corpus of the preliminary entity annotation,and then proofread it.Finally,a total of 38488 Mongolian corpora containing entity annotation of person names,place names and organization names were constructed.Then,based on the constructed Mongolian named entity annotation corpus,this thesis introduces different neural network models into the embedding layer of Bi LSTM-CRF Mongolian named entity recognition model to obtain different Mongolian representation vectors: CNN and Bi LSTM training are introduced to obtain Mongolian character representation,and Skip-gram and Glo Ve are used to obtain Mongolian morpheme representation.The experimental results show that the mixed vector neural network Mongolian named entity recognition model based on Skip-gram morpheme representation combined with Bi LSTM character representation has the best performance in identifying Mongolian name,place name and organization name,and the F1 value reaches 81.80 %.The model combines Mongolian morpheme features and character features,and obtains bidirectional long-distance semantic dependencies through Bi LSTM neural network,and then outputs the optimal global annotation sequence by CRF.Finally,aiming at the problems of limited bilingual corpus,poor performance of entity translation and model generalization in Mongolian-Chinese machine translation research based on neural network,this thesis combines the Bi LSTM-CRF Mongolian named entity recognition model of Skip-gram morpheme and Bi LSTM character mixed representation,proposes a Mongolian-Chinese neural machine translation method based on entity generalization.Firstly,the Mongolian named entity recognition model is combined with the Mongolian-Chinese neural machine translation based on Transformer to obtain the translation model.Then,the Mongolian-Chinese entity dictionary is used to translate the generalized entity.Finally,the generalized entity translation of the test set is restored to the Chinese translation of the test set to obtain the final translation.The Mongolian-Chinese neural machine translation model based on Transformer is used as the baseline experiment,and compared with the Transformer Mongolian-Chinese neural machine translation model with entity generalization.The experimental results show that the Transformer Mongolian-Chinese neural machine translation model that simultaneously generalizes person names and place names has the best performance,and the BLEU value reaches 49.39 %.

Keywords/Search Tags:

Neural networks, Corpus construction, Mongolian named entity recognition, Generalization, Mongolian-Chinese machine translation

PDF Full Text Request

Related items

1	Research On Mongolian And Chinese Machine Translation Based On Monolingual Corpus Training
2	A Study On Mongolian-Chinese Machine Translation Based On Neural Network
3	A Study On Statistical And Rule-Based Combined Mongolian-Chinese Machine Translation
4	Research On Mongolian-Chinese Neural Machine Translation Based On Data Augmentation And Pseudo-Parallel Corpus
5	Research On Mongolian-Chinese Neural Machine Translation Based On Monolingual Corpus And Reinforcement Learning
6	Mongolian Named Entity Recoginition
7	Research Of Optimization Methods Integration And Translation Rerank For Mongolian-chinese Machine Translation
8	Mongolian Word Segmentation Method Based On Neural Network And Its Application
9	Multi-granularity Mongolian-chinese Neural Network Machine Translation Research
10	Research On Morphologically Asymmetric Chinese Mongolian Statistical Machine Translation Model Construction Methods