Font Size: a A A

Jointly Learning Chinese Word Embeddings With Heterogeneous Morphemes

Posted on:2020-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2428330572973704Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text representation is the fundamental work in natural language processing.Traditional one-hot representation suffers from the data sparse problem and can not grasp the semantic relationship between words.Different from one-hot representation,distributed word representati-on,also known as word embedding,uses low-dimensional dense vectors in continuous space to represent words,which can better capture semantic and syntactic information.Word embedding has become the most commonly used way of word representation in natural language processing.As an ideogram,Chinese has unique linguistic characteristic.This thesis systematically summarizes the methods of Chinese word embeddings and proposes novel models to learn Chinese word embeddings.The specific work of this thesis is as follows:First,we systematically analyze and compare among the existing models of Chinese word embedding:the existing models ignore the semantic contribution of words to the context and exist a major limitation in handling disambiguation issue of subword;comprehensive quantitative experiments and detailed qualitative analysis are made to evaluate the generating word embeddings.Second,we propose a joint word embeddings model which integerates various granularities such as the words,characters and sub-characters with heregorous attention mechanisms:learn the semantics contribution of words to context via the Self-Attention mechanism;automatically learn the semantic offset of the sub-word and disambiguate sub-word from end to end.Third,the existing methods of Chinese words embeddings indicate that the mechanism of disambiguation is poorly interpreted.Based on the semantic drift hypothesis and parameter sharing mechanism,this thesis proposes a model of Chinese word embeddings based on shared drifts,which provides certain interpretability while ensuring validity.Fourth,based on the word representation model proposed in this thesis,we design and implement a digital reading recommendation system.Experiment shows that word embeddings model with heregorous attention mechanisms can effectively extract the text information of books.
Keywords/Search Tags:natural language processing, text representation, Chinese word embeddings, morpheme disambiguation
PDF Full Text Request
Related items