Font Size: a A A

Study On Chinese Named Entity Recognition Based On Hybird Neural Netword

Posted on:2021-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:H W WangFull Text:PDF
GTID:2428330626455040Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Named Entity Recognition(NER)is a basic task of natural language processing,and its purpose is to detect and classify the required named entities in the unprocessed original text.Named entity recognition is one of the key steps and technical means to convert unstructured text data into structured data.It has a key role in tasks such as automatic question answering,information retrieval and relationship extraction.Therefore,the research on named entity recognition is of great significance.With the development of deep learning,English named entity recognition technology has made great progress in recent years.However,the Chinese language has its own characteristics,and its structure is relatively complicated compared to English.Therefore,there are still many difficulties in the recognition of Chinese named entities,which mainly include:(1)Chinese often has polysemy,and the same Words have different meanings in different sentences.Some Chinese named entity recognition models improved by English named entity recognition models cannot effectively model and extract features from the global context to distinguish different meanings of same word.(2)There are no separators in Cinese texts.Models based on word-level input will have problems with the accuracy of Chinese word segmentation.Models based on word-level as input need to solve the problem of Chinese word segmentation.Character-level input lacks word boundary and semantic information,which increase the difficulty of entity recognition.(3)In some corpus data based on web text,there are some unknown words,which add a lot of difficulty to the named entity recognition task.In addition,most of the current named entity recognition models use CRFs algorithms at the decoding layer.Although they can solve some grammatical problems,the feature extraction effectiveness is poor and the Viterbi algorithm used in CRFs has low execution efficiency.In view of the above difficulties of Chinese named entity recognition task,this paper propose a hybrid neural network Chinese named entity recognition model based on the characteristics of the Chinese text.It does not use the traditional time-series model for modeling,but extracts features from all inputs through self-attention mechanism.And it proposes a decoding method based on n-gram convolution with binary classification for model training.Meanwile,in order to solve the problem of the lack of word boundaries and semantics in the Chinese named entity recognition model with "Chinese characters" as input,a character coding method based on "position-aware propogation" and a joint learning model of word segmentation are proposed.The main innovations and contributions of this paper are:1.This paper proposed an encoder based on fully self-attention mechanism.The vector representation of each word or character is related to the entire sentence through the attention mechanism.The words or character determine the weight assignment by scoring the words at all positions.Each word vector is integrated with the context information of the whole sentence,which effectively solves the problem of ambiguity of Chinese words.2.This paper proposed a n-gram convolutional decoder method.This method can effectively pay attention to the characteristics of Chinese named entity recognition during the decoding process.Through the two-dimensional convolution decoding of ngram,the current position word is associated with surrounding words.It improves the decoding efficiency while performing feature extraction on the logic between words.And using the same number of convolution kernels as the entity categories,it can more effectively extract effective features from the label dimension for the current named entity recognition task.In addition,we use binary classification on training for each convolution kernel to more effectively improve the training target of the model.3.Aiming at the named entity recognition model based on "Chinese characters" as input.a character encoding mechanism based on "position-aware propagation" were proposed.This paper proposes a word encoding mechanism based on "location awareness propagation" and implements it using Gaussian kernel function.Perform joint learning with word segmentation tasks during the training model.This method makes up for the lack of word boundary and semantic information.This article is based on the 1998 People 's Daily Corpus(PFR),the MSRA Corpus provided by Microsoft,and webpage content corpus(Boyue)of an institution generated by crowdsourcing for Chinese named entity recognition evaluation,the framework and parameter optimization of the model proposed in this paper,and the comparison with multiple traditional models and the Chinese named entity model that has achieved good results in the year.The experimental results show that the method proposed in this paper is effective and has significant improvements in some aspects compared other models.
Keywords/Search Tags:Chinese named entity recognition, neural network, Fully Self-Attention Mechanism, position-aware propogation, joint learning
PDF Full Text Request
Related items