Font Size: a A A

Recurrent Neural Network For Bilingual Lexicon Extraction From Comparable Corpora

Posted on:2019-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:L F LiuFull Text:PDF
GTID:2428330548966861Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the accelerating process of globalization,cross-language natural language processing plays a more and more important role in people's work and life.As a basic resource in cross language Natural Language Processing field,b ilingual lexicon have become the research focus of scholars.At present,the study of bilingual lexicon extraction is mainly focused on two aspects:one is bilingual lexicon extraction from parallel corpora,and the other is bilingual lexicon extraction from comparable corpora.Because of the limitations of parallel corpus resources and construction difficulties,the research based on comparable corpus is more practical.At present,there are two main categories of extraction models based on context information and vector based extraction from comparable corpus.However,most of these studies are focused on relatively easy access to bilingual terms or entity equivalent pairs in specific areas,and even if the corpus is large enough,its extraction performance is not ideal.In view of these shortcomings,the work mainly includes the following two aspects in order to improve the performance of bilingual lexicon extraction from comparable corpora:Firstly,based on recurrent neural network this thesis establishes a bilingual lexicon extraction model.In recent years,deep neural networks have become a research hotspot in the field of artificial intelligence.They have shown outstanding results in many tasks in the field of natural language processing.At the same time,in order to make full use of the vast amounts of data on the Internet,and improve the performance of bilingual lexicon has been further enhanced,this thesis proposes a bilingual lexicon extraction model based on circulant neural network.This model uses the word vectors of pre-trained pairs of translated words as input and output.Then it trains the recurrent neural network and passes similarity of word vectors.Calculate the acquisition candidate word.Comparing with the classic lexicon extraction model,experiments show that the proposed model has a significant improvement in extraction performance,especially when the corpus is large,the model has better extraction results,and also reflects the cyclical neural network in mass Data modeling has unique advantages.At the same time,it also fully embodies the unique advantages of the recurrent neural network in the issues studied in this thesis.Secondly,based on the canonical correlation analysis theory,this thesis proposes an improved bilingual bilingual lexicon extraction model.In general,mutually translated word pairs always appear in document pairs of similar topics,ie the documents they are in have a strong correlation at the semantic level,and this feature has universality and language independence.Therefore,this thesis uses the canonical correlation analysis theory to reintegrate the two linguistic spaces in the corpus,so that the two present a stronger semantic correlation,and then use the extraction model proposed in this thesis to extract bilingual dictionaries.To further enhance the extraction performance.Experiments show that compared with the extracting model based on the recurrent neural network,the improved model that integrates the bilingual latent semantics can improve the extraction performance to some extent.
Keywords/Search Tags:Lexicon extraction, Comparable corpus, Recurrent neural network, canonical correlation analysis
PDF Full Text Request
Related items