Font Size: a A A

Research On Deep Learning Based Bilingual Long Sentence Segmentation Method

Posted on:2020-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:X L WeiFull Text:PDF
GTID:2428330578454650Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Machine translation is an important research area of natural language processing.The performance of neural machine translation currently relies on high quality,large scale parallel corpora.Limited by computing resources,training time,and model framework,model training can only use parallel sentence pairs of moderate length.Excessive sentence pairs will be discarded,resulting in waste of resources.Therefore,it is of great theoretical and practical value to study how to divide a bilingual long sentence into an effective sentence pair.The traditional two-statement segmentation method includes rules-based,statistics-based,rule-and-statistic methods,and the like.However,such methods have disadvantages such as dependence on languages and low segmentation accuracy.In order to solve this problem,this paper focuses on the segmentation method of long sentence pairs in bilingual parallel corpus based on deep learning,so as to improve the utilization of corpus and improve the translation accuracy and translation quality of translation system.The main work and innovations of this paper include:(1)A bilingual long sentence segmentation method based on deep learning is proposed.By combining the monolingual segmentation model and the sentence alignment model,the high-quality short sentence pairs of long sentences are recalled.The experimental results show that the segmentation accuracy of the bilingual segmentation method is effectively improved and the performance of machine translation can be improved.(2)A monolingual long sentence segmentation model based on fusion-dependent syntactic structure is proposed.The accuracy of monolingual segmentation is improved by combining the neural network-based sequence annotation method and the dependency syntax structure.The experimental results show that compared with the traditional method,the proposed method increases the F1 value by 2.06 percentage points in the Chinese long sentence segmentation task,and the F1 value increases by 0.9 percentage points in the English long sentence segmentation task.(3)A sentence alignment model based on the pre-training language model is proposed.The pre-trained language model is used to obtain better sentence vectors for sentence alignment tasks.The experimental results show that the sentence alignment accuracy of this method is increased by 20.1 percentage points compared with the traditional method F1 value.In a word,this paper innovatively proposes a bilingual segmentation method combining segmentation model and alignment model.The experimental results on both monolingual segmentation and bilingual segmentation tasks are much higher than the traditional methods,which proves the validity and practicability of the proposed method.
Keywords/Search Tags:Long sentence segmentation, Bilingual alignment, Dependency syntax, Pre-trained language model, Machine translation
PDF Full Text Request
Related items