Research On Multi-granularity Chinese Word Embedding Based On Glyph Structure

Posted on:2021-01-22

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Yang

Full Text:PDF

GTID:2428330620468763

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Word representation is one of the basic tasks in natural language processing,the traditional method of word representation usually uses the one-hot representation based on vector space.With the rise of deep neural network in the field of natural language processing,word embedding model has gradually become the mainstream method of word representation.At the same time,the research of Chinese word embedding model has been paid more and more attention.Different from the phonetic writing system represented by English,Chinese characters evolved from pictographs,and various glyph structure components in Chinese characters contain rich semantic information.Existing research shows that the quality of Chinese word vectors can be improved effectively by mining the semantic information.Based on the current Chinese word embedding model,this thesis proposes a multi-granularity Chinese word embedding model based on glyph structure.The specific work of this thesis is as follows:First,the historical development and research status of word vector representation as well as the specific methods of existing models and their advantages and disadvantages are analyzed and introduced;Second,a multi-granularity Chinese word embedding model based on glyph structure is proposed,joint learning is based on multi-granularity such as words,characters and glyph structure components,to fully mine the semantic information contained in the glyph structure of Chinese characters,by using the Bi-directional Long Short-Term Memory network and Self-Attention mechanism make the information in the word embedding more semantically related to the word,and can avoid adding noise to the word embedding;Third,a variety of evaluation methods are integrated to evaluate the word vectors,and the trained word vectors are applied to four basic natural language processing tasks for experiments and qualitative analysis,and the experimental results obtained are compared with the experimental results of the existing advanced models and draw conclusions;Fourth,this thesis summarizes the research results of the proposed word vector model and looks into the future development of Chinese word embedding method.

Keywords/Search Tags:

Chinese word embedding, Natural language processing, Bi-LSTM, Self-Attention mechanism

PDF Full Text Request

Related items

1	Chinese Word Segmentation Model Based On Improved Bidirectional LSTM-CRF
2	Research On Chinese Word Segmentation Based On Deep Learning
3	Research On Chinese Word Embedding Model Integrated With Sub-character Semantics
4	Research On Attention Mechanism For Natural Language Processing
5	Research On E-Commerce Commodity Title Category Classification Algorithm Based On Natural Language Processing Technology
6	Research On Chinese Word Segmentation Integrating Pinyin And Tone Information
7	Improvement And Application Of Text Classification Based On RNN
8	An Attention-based Model For Stance Detection In Chinese Microblog
9	Research On Image Caption Method Based On Attention Mechanism
10	A Representation Method Of Chinese Characters And Words Based On Word-Character Alignment