Font Size: a A A

Research On The Mining Of Chinese And Uyghur Organization Name Dictionary Based On Neural Network

Posted on:2021-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:C Y XuFull Text:PDF
GTID:2518306128976009Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
As the key information in the text,named entities play an important role in correctly understanding the semantics of the text.Especially when using machine translation to understand the semantics of the original text,the correct translation of named entities is particularly important compared to general vocabulary.However,due to the relatively complicated composition structure and the indefinite length of the organization name entity name entity,it has become the main research hotspot for the identification and translation of named entities.The organization name bilingual dictionary can provide equivalent information at the semantic level of words,and is a very important bilingual resource in the institution name translation task.Therefore,this paper focuses on the mining of Chinese and Uyghur bilingual organization name dictionaries.In order to build a bilingual organization name dictionary,we need to get accurate Chinese organization name entity,then use reverse translation method to forge the bilingual organization name,and get the final bilingual organization name dictionary according to the screening strategy.(1)In order to solve the phenomenon that the existing sequence labeling model has many mistakes to identify the organization name,this paper proposes the idea of the Chinese organization name sequence labeling model that incorporates the discriminant model.First,a short text classification idea is used to train a discriminant model on the artificially constructed 170,000 organization name labeling data set;Then,a sequence labeling model that recognizes different named entities fused with the discriminant model is analyzed.The experimental results show that the method of fusion discriminant model effectively improves the performance of mechanism name recognition.(2)Machine translation is one of the methods to quickly mine Chinese and Uyghur organization name dictionaries.However,due to the low frequency of some organization names in Chinese and Uyghur parallel corpus,it is prone to the problems of unregistered words and wrong translation.For this reason,this paper uses phrase based statistical machine translation method and neural network-based method to model the translation of Chinese and Uyghur organization name translation and Uyghur and Chinese organization name translation respectively.The experimental results show that the BLEU of the Chinese and Uyghur model based on neural network is 84.35 and the accuracy is 74.67%.The BLEU of Uyghur and Chinese model is 95.79,and the accuracy is 88.90%.(3)First,the pseudo Chinese and Uyghur organization name dictionary is constructed on the basis of the Chinese Uyghur organization name translation model;Then,the pseudo Chinese Uyghur organization name is translated reversely by using the Chinese Uyghur machine translation model;Finally,the Chinese organization name that is completely consistent before and after translation is screened,and the corresponding pseudo Uyghur organization name is added to the Chinese Uyghur bilingual organization name dictionary.On the basis of the above ideas,this paper designs and implements a Chinese Uygur organization name dictionary mining system,and further excavates100000 pairs of Chinese Uygur organization name dictionaries on 7 million Chinese monolingual sentences of CCMT.
Keywords/Search Tags:neural network, dictionary mining, organization name, organization name recognition, organization name translation
PDF Full Text Request
Related items