A Research Of Document Representation And Bilingual Word Embeddings

Posted on:2019-08-27

Degree:Master

Type:Thesis

Country:China

Candidate:Y W J Ou

Full Text:PDF

GTID:2428330542994219

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Document representation and bilingual word embeddings are two important text representation learning techniques in natural language processing.They provide good feature representations for other natural language processing tasks.These two directions are the main research content of this dissertation.Document representation represents a document as a fixed-length vector.Existing work simply considers the document to be a sequence of texts,does not consider the hierarchical relationships in the document,and on the other hand ignores the different importance of different parts of the document.This dissertation proposes a document representation model based on hierarchical attention mechanism(HADR),taking into account the differences in the sentences of the documents and the differences in the words in the sentences.The experimental results show that after considering the differ-ence between the importance of words and the importance of sentences,the document representation has better performance.And the effect of HADR model on the sentiment classification of documents is higher than that of Doc2vec and word2vec models.With the successful application of representation learning in single language,some methods begin to study the cross-lingual text representation because of the needs of cross-lingual natural language processing task,and build a bilingual word embedding model.Bilingual word embeddfings is a technique that can both represent different languages in the same latent vector space and enable the knowledge transfer across lan-guages.To learn such representations,most of existing works require parallel sentences with word-level alignments and assume that aligned words have similar Bag-of-Words(BoW)contexts.However,due to differences in grammar structures among different languages,the contexts of aligned words in different languages may appear at different positions of the sentence.To address this issue of different syntactics across differ-ent languages,we propose a model of bilingual word embeddings integrating syntactic dependencies(DepBiWE)by producing dependency parse-trees which encode the ac-curate relative positions for the contexts of aligned words.In addition,a new method is proposed to learn bilingual word embeddings from dependency-based contexts and BoW contexts jointly.Extensive experimental results on a real world dataset clearly validate the superiority of the proposed DepBiWE model on various natural language processing tasks.

Keywords/Search Tags:

Document representation, Attention, Unsupervised learning, Bilingual word embeddings, Syntactic dependencies

PDF Full Text Request

Related items

1	Bilingual Word Representation Learning From Non-parallel Corpora
2	Research Of Sentiment Classification Based On Attention Word Embeddings
3	Research On Thai Dependency Syntax Analysis Method Based On Cross-Language Transfer Learning
4	Research On Chinese-korean Cross-lingual Text Classification Method Based On Bilingual Topical Word Embedding Model
5	Research On Cross-language Document Sorting Learning Method Based On Bilingual Document Similarity
6	Jointly Learning Chinese Word Embeddings With Heterogeneous Morphemes
7	Document-level Sentiment Classification Based On Dynamic Word Embeddings And Hierarchical Neural Networks
8	Research On Unsupervised Cross-lingual Mappings Of Word Embeddings
9	Research And Application Of Named Entity Recognition Method For Dialogue Domain
10	A minimally supervised word sense disambiguation algorithm using syntactic dependencies and semantic generalizations