Research On Word Alignment Technology Based On Deep Learning

Posted on:2022-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:B B Zhang

Full Text:PDF

GTID:2518306329983769

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

The research of bilingual word alignment technology is of great significance in the field of natural language processing.Bilingual word alignment technology also plays an important role in the application of neural machine translation,such as comment transfer and word injection,and can also assist the quality of translation proofreading.Existing word alignment method by using artificial marking,study English word aligned parallel corpora in bilingual word alignment method and manual annotation alignment.However,the number of data sets cannot meet the training requirements.This is because the traditional word alignment model is artificially selecting characteristics,and the characteristics of the artificial selection influenced by mark’s knowledge ability,which tend to be sparse.Aiming at the above problems,this paper first establishes the training data set through the traditional machine learning method,then completes the construction of neural network on this basis.Finally,an unsupervised bilingual word alignment system is implemented.The research contents of this paper include:（1）In any unsigned English sentence alignment using traditional machine learning method based on patent title corpora GIZA++ completed the middle term alignment,and using external network dictionary to verify the accuracy of the results based on GIZA++word alignment.The verification results show that the method of alignment bilingual data sets from the next bilingual word alignment model of neural network training is able to provide data support.（2）On the basis of the data set design marking scheme,using the combination of the letter ’B’ and number,said parallel sentence in the English words and the alignment of Chinese words in English,and carefully analyzing the characteristic of data set and meet the word alignment words of long range selected the best data sets complete annotation of neural network training data set.This procedure is able to use an unsupervised way to complete the training data set was used to construct,which saves the cost of artificial labeling word aligned corpus.The method of building data set can also provide convenience for other research based on corpus.（3）Neural network word alignment method is integrated with bilingual syntactic analysis.This method requires the syntactic analysis of parallel Bilingual sentences in English and Chinese,and the syntactic structure is integrated into the coding layer of the neural network.The neural network part chooses the combination of the two-way long-short memory network and the text-convolutional network,so as to train the neural network word alignment model that fuses the linear syntactic structure.（4）In terms of engineering implementation,this paper designs and implements an English-Chinese bilingual word alignment system,which completes data set construction,word alignment annotation,syntactic analysis,neural network model construction and word alignment result visualization.This paper uses English and Chinese titles of scientific literature in the field of computer and electronic information and China Patent Information Center to crawl the English and Chinese parallel patent titles.

Keywords/Search Tags:

Word alignment, Parallel corpus, Syntactic analysis, Neural Network

PDF Full Text Request

Related items

1	Bilingual Word Alignment System Based On English-chinese Parallel Corpus
2	Research On Syntactic Knowledge Mining And Extraction Based On English-chinese Parallel Corpus
3	Research On The Construction Of Ancient English Parallel Corpus Based On Multi-Level Automatic Alignment
4	Research On The Automatic Construction Of Chinese-Japanese Parallel Corpus
5	Research On Sentence Alignment Method Based On Cross-lingual Word Embeddings
6	The Desing And Implementation Of Uyghur-Chinese Parallel Corpus Processing System
7	A Study On The Key Technologies Of Web-Based Indonesian-Chinese Parallel Corpus Construction
8	Research And Application Of Key Technologies Of Chinese-english Parallel Corpus
9	Bilingual Word Embedding Based Word Alignment On Large-Scale Corpus
10	The Study Of The Alignment Method In The Chinese-English Parallel Corpora