Research And Application On Word Embedding Of Low Frequency Words

Posted on:2019-05-30

Degree:Master

Type:Thesis

Country:China

Candidate:F Li

Full Text:PDF

GTID:2348330542498757

Subject:Intelligent Science and Technology

Abstract/Summary:

PDF Full Text Request

Representation of word meaning has long been a fundamental task in natural language processing.Word embeddings has been used in many natural language processing tasks,for example,POS tagging and named entity recognition.Traditional methods treat each word a symbol,which is not cable of modeling semantic and syntactic relationship between two words.Distributed representations encode words as low-dimensional real-valued vectors.Semantic relation of two words can be represented by distance between their corresponding word embeddings.Thus become the most popular representation of words nowadays.Through extensive work on word embeddings,problems remain with low frequency words.In this thesis,we discuss following questions:(1)The reason why word embeddings of low frequency words is less effective(2)Employ interior information of words to improve low frequency words on Chinese.(3)A universal method to boost performance of low frequency words.The main content is as follows:We propose an average similarity based metric basing on distributed word representation.Experiments on different training algorithms,corpora,and languages show that the relation is stable.We further propose a method to distinguish low frequency words.Further apply this method to design a similarity metric,experiments on word similarity show 0.02～0.05 performance boost compared with cosine similarity,.Using radicals of Chinese characters to boost the performance of low frequency words on Chinese.Radicals of Chinese characters often convey some meanings,we share radicals’ weights between low frequency words and high frequency words.Experiment result show 0.02 performance increases.Propose a pseudo context method which is language independent.In pseudo context method,we exploit the context of other words as the context of a low frequency words to augment data.Thus boost the performance of low frequency words.

Keywords/Search Tags:

distributed representation, word embedding, low frequency word

PDF Full Text Request

Related items

1	Research On The Representation Of Word Embedding Based On Knowledge Fusion
2	Dynamic Weighting Of Word Embedding And Distributed Learning Strategies
3	Automatic Detection And Analysis Of Chinese Lexical Changes Based On Diachronic Corpora
4	Research On Chinese Word Segmentation Method Based On Word Embedding
5	Research On Word Sense Disambiguation Method Based On Word Embedding
6	Research On Network Representation Learning Method Based On Word Embedding
7	The Study Of Diachronic Analysis Based On Word Embeddings And Evaluation
8	A Study On Improving Multi-prototype Word Embedding
9	The Research On Word Representation Theory And Its Application In Natural Language Process
10	Word Similarity Measurement Based On Word Embedding And WordNet