Font Size: a A A

Research On Chinese Named Entity Recognition Based On Multi-task Learning And Lexicon

Posted on:2022-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:C HuFull Text:PDF
GTID:2518306536463744Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Today,with the rapid development of the mobile Internet,a large amount of unstructured data generated on the Internet needs information extraction technologies such as named entity recognition to extract effective information.The purpose of the named entity recognition task is to identify specific types of entity names in text data such as person names,place names,organization names,etc.The recognition results affect the effect of downstream natural language processing tasks such as relation extraction,text understanding,automatic question and answer.Therefore,it has very important research value.The development of deep learning theory and technology has promoted the progress of named entity recognition.Studies have shown that multi-task learning can improve the performance of deep learning models,but the classic named entity recognition models have only one single task to optimize.This thesis finds that there are many samples that do not contain entities in many named entity recognition data sets.For this reason,a binary classification task is designed to determine whether the samples contain entities,and on this basis,a Chinese named entity recognition model based on multi-task learning is proposed.The model combines the loss of the named entity recognition task and the loss of the binary classification task of judging whether the sample contains the entity using weighted approach,and then optimizes the two tasks at the same time.This thesis has conducted experiments on both Bi-LSTM and Transformer encoding models.The experimental results show that the multi-task mechanism proposed in this thesis can improve the generalization ability and performance of the named entity recognition model on both encoding models.Chinese does not have the linguistic characteristics of word segmentation boundary,which leads to the problem of entity boundary segmentation errors in the task of Chinese named entity recognition.Considering that the task of named entity recognition is a task that depends on external knowledge,the introduction of external knowledge such as dictionary information containing word boundary features can help to improve the ability of entity boundary recognition.Therefore,this thesis proposes a Chinese named entity recognition model that integrates dictionary information.In this model,the words matched in the lexicon are divided into four categories according to the location information,and the weighted word embedding of each category is calculated.Then,the attention mechanism is used to integrate the weighted word embedding containing the word boundary information into the initial word embedding,which can enhance the word boundary information and location information in the word embedding.This thesis has conducted experiments on both static word embeddings and BERT pre-training word embeddings.The experimental results show that the named entity recognition model based on lexicon can solve the problem of entity boundary segmentation errors and improve the performance of named entity recognition.
Keywords/Search Tags:Chinese Named Entity Recognition, Multi Task Learning, Attention Mechanism, Lexicon Information
PDF Full Text Request
Related items