Jointly Learning Chinese Word Embeddings With Heterogeneous Morphemes

Posted on:2020-10-29

Degree:Master

Type:Thesis

Country:China

Candidate:J Liu

Full Text:PDF

GTID:2428330572973704

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Text representation is the fundamental work in natural language processing.Traditional one-hot representation suffers from the data sparse problem and can not grasp the semantic relationship between words.Different from one-hot representation,distributed word representati-on,also known as word embedding,uses low-dimensional dense vectors in continuous space to represent words,which can better capture semantic and syntactic information.Word embedding has become the most commonly used way of word representation in natural language processing.As an ideogram,Chinese has unique linguistic characteristic.This thesis systematically summarizes the methods of Chinese word embeddings and proposes novel models to learn Chinese word embeddings.The specific work of this thesis is as follows:First,we systematically analyze and compare among the existing models of Chinese word embedding:the existing models ignore the semantic contribution of words to the context and exist a major limitation in handling disambiguation issue of subword;comprehensive quantitative experiments and detailed qualitative analysis are made to evaluate the generating word embeddings.Second,we propose a joint word embeddings model which integerates various granularities such as the words,characters and sub-characters with heregorous attention mechanisms:learn the semantics contribution of words to context via the Self-Attention mechanism;automatically learn the semantic offset of the sub-word and disambiguate sub-word from end to end.Third,the existing methods of Chinese words embeddings indicate that the mechanism of disambiguation is poorly interpreted.Based on the semantic drift hypothesis and parameter sharing mechanism,this thesis proposes a model of Chinese word embeddings based on shared drifts,which provides certain interpretability while ensuring validity.Fourth,based on the word representation model proposed in this thesis,we design and implement a digital reading recommendation system.Experiment shows that word embeddings model with heregorous attention mechanisms can effectively extract the text information of books.

Keywords/Search Tags:

natural language processing, text representation, Chinese word embeddings, morpheme disambiguation

PDF Full Text Request

Related items

1	A Representation Method Of Chinese Characters And Words Based On Word-Character Alignment
2	The Study And Application Of Text Embeddings With Deep Learning Technique
3	Word Embeddings Towards Text Classification Of Emotion And Topic
4	Chinese Word Embeddings Based On Neural Network Approaches
5	Based On Semi-supervised Method Of Chinese Word Sense Disambiguation
6	Chinese Word Sense Disambiguation Based On Parsing Tree
7	Design And Implements Of WSD System Based On Chinese Real Text
8	Sentence-Level Language Analysis With Contextualized Word Embeddings
9	Research On Word-level Ambiguity Resolution Method
10	Research On Jointly Learning Word Embeddings And Latent Topics In Text