Exploration On Sense Embedding Model Based On Gaussian Distribution

Posted on:2019-09-14

Degree:Master

Type:Thesis

Country:China

Candidate:Y D Tian

Full Text:PDF

GTID:2428330590967371

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In natural language processing,text data is represented in strings usually.However,strings can not be used to do calculation directly.We use numbers to represent words to solve this problem.The numbers can be the index of words in vocabulary,but words lose a lot ot semantic information in this way.Further more,only vectors can be used as input of neural network model.So that word embeddings come out.Embeddings are vectors representing words.Vectors can also be regarded as points in high-dimension space.So that the distance between points can infer the similarity between words.Word embeddings are widely used in machine translation,text generation,recommend systems and so on.Some of these tasks,such as machine translation and text generation requires a high accuracy on word senses.However word embedding can not represent all the senses of an ambiguous word.So that some researchers proposed sense embedding.The commonly used sense embeddings are in vector form.But vectors are not able to maintain too much semantic information.For example,asymmetrical information will never be contain in vectors.So that we propose to represent senses by distributions.Gaussian distribution is voted because of its math property.The mean of multivariate Gaussian distribution is a point in high-dimension space.Covariance matrix controls the shape of Gaussian distribution.By comparing the shape between Gaussian distributions,we can conclude the entailment information.In this paper,we introduce our work on representing senses by Gaussian distributions.We propose a model modified from hidden Markov model and the method to infer senses of ambiguous words.We implemented our model and trained a set of Gaussian sense embeddings successfully.During training,we used energy-based method and applied two kinds of energy functions.The evaluation of our embeddings is performed on two tasks: word similarity calculation and word entailment detection.We find it not suitable for word similarity task to use expected likelihood energy function and proved it.Our Gaussian sense embeddings performs well on word entailment detection.

Keywords/Search Tags:

Machine Learning, Natural Language Processing, Word Embedding, Gaussian Distribution

PDF Full Text Request

Related items

1	Research On Machine Learning For Natural Language Processing And Transmission
2	Leverage Label Distribution Learning In Natural Language Processing
3	Joint Learning Methods For Distributed Representations Of Natural Language
4	Research On Jointly Learning Word Embeddings And Latent Topics In Text
5	Deep Contextual Word Embedding In Natural Language Processing
6	Word Embedding Revision Based On Morphological Information And Semantic Lexicons
7	Sentence Vectorization Modeling And Text Level Application
8	Research On Multi-granularity Chinese Word Embedding Based On Glyph Structure
9	Research On Chinese Word Segmentation Based On Deep Learning
10	Improvement And Application Of Text Classification Based On RNN