Font Size: a A A

The Research Of The Chinese Named Entity Recognition Method With Glyph Feature

Posted on:2020-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:J Y PengFull Text:PDF
GTID:2518306503972019Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Named Entity Recognition(NER)is a natural language processing technology for detecting and classifying named entities in a text.It is an important part of Information Extraction(IE)technology.Compared with the traditional NER methods such as conditional random field(CRF)model and support vector machine(SVM)model,deep learning(DL)based NER method has gradually become the mainstream in NER field.The DL based NER method encodes distributed word or character embedding of the text sequence through the recurrent neural network(RNN),and finally classifies and labels the named entities through the decoder.The current Chinese NER method follows the above general method,and does not utilize the glyph feature of Chinese characters.In addition,with the success of self-attention mechanism in machine translation,the self-attention mechanism is gradually receiving attention in other fields of natural language processing.Based on the above issues,this thesis has done the following works:1.Proposed a Chinese glyph vector model(GlyVec),and proposed a GlyVec+Char Embedding NER model with RNN.We combined the computer vision and natural language processing techniques,used character rendering technique and the inception based convolution neural network to extract the Chinese glyph vector(GlyVec),and proposed a GlyVec+Char_emb NER model with GlyVec and char embedding under the RNN;2.Proposed a GlyVec + Self-attention NER model.First,we used the self-attention mechanism(Transformer)to establish a Transformer NER model and compared it with the RNN NER model in efficiency and accuracy.Then,we used the Transformer based pre-trained language model BERT and GlyVec to build a GlyVec+BERT NER model,the GlyVec in this model is simultaneously trained by the auxiliary language model.The NER models are tested on three standard Chinese NER dataset: MSRA,CCKS2017 and CCKS2018.The results show that the GlyVec+Char_emb NER model is better than the baseline model.For the study of self-attention mechanism,Transformer encoding method is better than RNN encoding method in efficiency,but no obvious improvement in accuracy.For the study of BERT,the results of BERT NER model are better than all the previous results.The results of GlyVec+BERT NER model are better than the baseline of BERT NER model.
Keywords/Search Tags:Chinese named entity recognition, Deep learning, Glyph feature, Self-attention mechanism
PDF Full Text Request
Related items