Research And Application Of The Word Embedding Method

Posted on:2019-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:F Xu

Full Text:PDF

GTID:2348330542958069

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Word Embedding uses low-dimensional dense vector to represent words and reflect the relationship between words through vector operations.Hence it is widely applied in natural language processing tasks.As a research hotspot in the field of natural language processing,the research of Word Embedding has been greatly promoted by researchers.However,there are two problems with this technology:(1)How to choose the appropriate algorithm to construct Word Embedding;(2)What are the factors that determine the quality of Word Embedding and how to improve the quality of Word Embedding.For the problem of choosing the appropriate algorithm to construct Word Embedding,this paper researches and builds the Word Embedding method based on matrix factorization.We compared the constructed model with Skip-gram model and GloVe model under the word similarity task in different windows.Experimental results show that in the process of constructing the matrix factorization model,the similarity method using cosine similarity is superior to that using Hellinger distance,the weighting method using conditional probability is better than that using word frequency,and it is found that the quality of similarity matrix before dimensionality reduction has a linear correlation with the quality of Word Embedding.To determine the quality factors of Word Embedding,and improve the quality of Word Embedding,this paper proposed a method of Word Embedding based on similarity matrix centralization.With this method,the similarities between similar words are relatively enhanced,and the similarities between different words are relatively weakened.The effectiveness of this method is verified in the word similarity task.Experimental results show that the quality of Word Embedding is improved by the similarity matrix centralization method and can reach or exceed the Skip-gram model.Centralization can improve the quality of similarity matrix before dimensionality reduction,and then improve the quality of Word Embedding.This paper implemented the Word Embedding system based on the centralization method,and set different parameters to perform training on the corpus.The generated Word Embedding was applied to the Chinese Named Entity Recognition task.Experimental results show that the centralization method can make better use of the context to improve the recognition effect.

Keywords/Search Tags:

Word Embedding, Matrix Factorization, Similarity Matrix Centralization, Chinese Named Entity Recognition

PDF Full Text Request

Related items

1	Research On Chinese-Lao Bilingual Named Entity Recognition And Alignment Method
2	Research On Chinese Named Entity Recognition Technology Based On Neural Networks
3	Research On Chinese Named Entity Recognition Based On Deep Learning
4	Research Of Chinese Named Entity Recognition Based On Recurrent Neural Networks
5	Research On Named Entity Recognition For Science And Technology Terms Based On Dependent Entity Word Vector
6	Research On Named Entity Recognition Relation Extraction And Recommendation Algorithm In Chinese Tourism
7	Research On Chinese Named Entity Recognition Based On Feature Enhancement
8	Chinese Named Entity Recognition Based On Neural Network
9	The Research And Implementation Of Named Entity Recognition For Chinese Social Media
10	Research On Chinese Named Entity Recognition Based On Deep Learning