Font Size: a A A

Implementation Of Indonesian Machine Translation System Based On Deep Learning

Posted on:2020-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2428330572483815Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the increasing frequency of exchanges between Indonesia and the outside world,Indonesian language has become a major obstacle to mutual communication,and machine translation is one of the effective means to solve this obstacle.For machine translation tasks,as well as in many fields like deep learning,especially in the sequence to sequence of neural network translation system to break the traditional machine translation situation,the integration of the structure and good translation results by the researcher's attention.This paper takes deep learning as the research design background,combined with the relevant cutting-edge research results in recent years,aiming at the language characteristics of Indonesian and the problems faced in completing the Indonesian Language machine translation system,mainly in the following aspects:(1)According to the characteristics of the high similarity between Indonesian and English,after reference to the neural network architecture of English translation,it is determined that the Indonesian translation adopts the neural network structure of encoding-decoding,and the basic structure and the corresponding improvement of the implicit layer element calculation are carried out.(2)Data mining has the problem of small quality difference in the scale of single dual corpus,taking two different mining strategies and implementing them in engineering:local directional crawl and distributed crawl on cluster.(3)Maximize the size of the data by cleaning and preprocessing it in a variety of ways to ensure the quality of the data.(4)Before the model training,the Indonesian language is trained in character level,and the language detection model is generated.In addition,the problem of the occurrence of non-login words in training is analyzed and solved.(5)Optimize the training model and realize the integrated Indonesian machine translation system.Finally,the translation model test of mutual translation between Indonesian and Chinese,Indonesian and English is tested,and the BLEU value of the optimal model fusion reaches 39.52.The model results are applied to the actual visualization system to realize the detection of the language,the mutual translation of multiple languages and the manual correction of the results.
Keywords/Search Tags:Indonesian, machine translation, encoding and decoding, model training
PDF Full Text Request
Related items