Font Size: a A A

Research On A Resume-oriented Chinese-Uyghur Machine Translation System

Posted on:2019-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:L L WangFull Text:PDF
GTID:2428330566467159Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The majority of Uyghurs have serious cross-language communication disorders at present.Along with the development the Silk Road Economic Belt,research on Uyghur language Translation Information is essential guarantee to promote communication between nations.The existing Chinese-Uyghur machine translation system is not suitable for specific fields,so the paper studied the Chinese-Uyghur machine translation and the key problem that affect the translation quality,with focusing on the resumes.The resumes mainly include named entities that easily result in poor translation quality because named entities are a common UNK problem in Machine Translation.In order to extract Chinese-Uyghur named entity equivalences,research on named entity recognition ought first to be carried out.However,the existing research mainly focus on a single-class named entity and their methods are relatively traditional.Therefore,the paper carried out different method to do Uyghur named entity recognition research.Based on the above work,a template-based Chinese-Uyghur machine translation system was implemented after getting Chinese-Uyghur bilingual named entity translation equivalences.First,to efficient use of unsupervised semantic and structural information through un-annotated data,a Uyghur named entity recognition method based on semi-supervised learning was proposed with grounding on the framework of conditional random fields.By introducing lexical features,dictionary features and unsupervised learning features based on word embedding,we compared different features which influenced the recognition rate and optimized the model.The experimental results illustrate that the F-score of Uyghur named entity recognition reach 87.43%when a CRF model is fusing multi-features.Therefore,the organic combination of morphological features and features of unsupervised learning can greatly reduce the workload of selecting the features manually and improve the performance of Uyghur named entity recognition at the same time.Secondly,the traditional methods had no knowledge of learning from Uyghur morphological information between characters.The paper proposed a Uyghur named entity recognition based on Bi-LSTM-CRF and an attention mechanism.First,character-level vector was obtained by bidirectional LSTM based on word embedding;secondly,word embedding and character-level embedding were be processed by an attention mechanism to dynamically learn some effective information;finally,the attention-based embedding was taken as input of Bi-LSTM and predicts a sequence of tags through CRF model.The experiments show that Bi-LSTM-CRF model with attention-based embedding provides better performance than the CRF model in named entity recognition task.Finally,this paper revolved around the resumes,analyzed the sentence structure of the Chinese resumes,construct Chinese-Uyghur bilingual named entity translation equivalences and template library.A resume-oriented Chinese-Uyghur Machine Translation system was designed and implemented with using an approach that based on a combination of dictionaries and templates.The experiments show that the system has better practical application value compared with the Machine Translation systems that rely on large-scale bilingual corpus.
Keywords/Search Tags:Uyghur, Named Entity Recognition, Machine Translation, named entity translation equivalences
PDF Full Text Request
Related items