| Mongolian word segmentation is the foundation of Mongolian information processing and the key to downstream tasks such as Mongolian-Chinese machine translation.In recent years,deep neural network models have been widely used in the field of natural language processing,and good results have been achieved in word segmentation research tasks.This thesis first investigates Mongolian word segmentation methods using various deep learning models,and proposes an improved model for Mongolian-Chinese neural machine translation by comparing the effects of Mongolian partial segmentation,BPE segmentation and neural network segmentation methods on Mongolian-Chinese machine translation models,and based on this,an improved Mongolian-Chinese neural machine translation model for filtering stop words based on the segmentation of Mongolian neural network is proposed.The main research contents are:Firstly,this thesis combines the Bi LSTM neural network model,the CNN neural network model,and the traditional CRF statistical model to propose a Mongolian word segmentation method based on the Bi LSTM-CNN-CRF neural network model.The model retains the features of the Bi LSTM model in capturing long-distance information,and at the same time has the functions of CNN model to extract local features and CRF to jointly decode the optimal tag sequence.The experimental results show that the proposed Bi LSTM-CNN-CRF neural network word segmentation model has the best performance compared with Bi LSTM and Bi LSTM-CRF neural network models,and the accuracy of Mongolian word segmentation reaches97.37%.Then,in this thesis,a Mongolian-Chinese neural machine translation model based on Mongolian word segmentation is constructed.The Mongolian word segmentation is pre-processed using partial segmentation,BPE segmentation and neural network-based segmentation,and the Mongolian-Chinese neural machine translation model is trained with the pre-processed corpus of different word segmentation methods.The experimental results show that the performance of Mongolian-Chinese machine translation based on Mongolian word segmentation is improved compared with the baseline experiment.Among them,the Mongolian partial segmentation method has the best translation quality for Mongolian-Chinese machine translation,resulting in a BLEU5 value of 72.10% for Mongolian-Chinese machine translation.Finally,by analysing the granularity analysis of Mongolian corpus segmentation by various segmentation methods and combining the Mongolian word formation characteristics,we propose an improved neural machine translation method for Mongolian-Chinese neural network Mongolian segmentation based on filtering discontinuous words.The improved method filters the single connected vowels "V(?)" and "U(?)" and the unstable "N(?)" that appear in the Mongolian neural network-completed corpus.The experimental results show that the filtered Bi LSTM-CNN-CRF neural network Mongolian word segmentation method achieves a BLEU5 value of73.30% in the machine translated translation,which is 1.95 percentage points higher than the BLEU5 value of the translation without filtering,and outperforms the partial segmentation method and the BPE segmentation method.Thus the improved Mongolian-Chinese neural machine translation method of filtering deactivated words in the Mongolian segmentation of the neural network can further improve the performance of Mongolian-Chinese neural machine translation. |