Font Size: a A A

Research On Multi-granularity Chinese Word Embedding Based On Glyph Structure

Posted on:2021-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q YangFull Text:PDF
GTID:2428330620468763Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Word representation is one of the basic tasks in natural language processing,the traditional method of word representation usually uses the one-hot representation based on vector space.With the rise of deep neural network in the field of natural language processing,word embedding model has gradually become the mainstream method of word representation.At the same time,the research of Chinese word embedding model has been paid more and more attention.Different from the phonetic writing system represented by English,Chinese characters evolved from pictographs,and various glyph structure components in Chinese characters contain rich semantic information.Existing research shows that the quality of Chinese word vectors can be improved effectively by mining the semantic information.Based on the current Chinese word embedding model,this thesis proposes a multi-granularity Chinese word embedding model based on glyph structure.The specific work of this thesis is as follows:First,the historical development and research status of word vector representation as well as the specific methods of existing models and their advantages and disadvantages are analyzed and introduced;Second,a multi-granularity Chinese word embedding model based on glyph structure is proposed,joint learning is based on multi-granularity such as words,characters and glyph structure components,to fully mine the semantic information contained in the glyph structure of Chinese characters,by using the Bi-directional Long Short-Term Memory network and Self-Attention mechanism make the information in the word embedding more semantically related to the word,and can avoid adding noise to the word embedding;Third,a variety of evaluation methods are integrated to evaluate the word vectors,and the trained word vectors are applied to four basic natural language processing tasks for experiments and qualitative analysis,and the experimental results obtained are compared with the experimental results of the existing advanced models and draw conclusions;Fourth,this thesis summarizes the research results of the proposed word vector model and looks into the future development of Chinese word embedding method.
Keywords/Search Tags:Chinese word embedding, Natural language processing, Bi-LSTM, Self-Attention mechanism
PDF Full Text Request
Related items