Font Size: a A A

Chinese Named Entity Recognition Based On Vocabulary Enhancement And Deep Learning

Posted on:2022-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:W H YangFull Text:PDF
GTID:2518306779496184Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Named entity recognition,as a basic task in the field of natural language processing,can extract entity information from unstructured text data.At the same time,named entity recognition plays an important role in question answering system,information extraction,reading comprehension and other natural language processing tasks.Under the background of the explosive increase of text data and the rapid improvement of GPU computing power,deep learning has been applied to solve the problem of named entity recognition and achieved good results.It has become the mainstream solution to solve the problems in this field.Although named entity recognition technology has made rapid development,many named entity recognition researches are in the English field.The named entity recognition in the Chinese field still has the following problems:(1)Different from English,the words are separated by spaces,there is no obvious boundary between Chinese words,and different word divisions may bring different meanings,which brings certain difficulties to named entity recognition.(2)Chinese has a complex glyph structure,inconspicuous character features,and single information granularity which cannot be simply divided into the combination of English letters like English words.(3)When there are many types of entities in the dataset,the training speed of the named entity recognition model is not fast and the prediction accuracy is not high.Therefore,how to improve the effect of Chinese named entity recognition and speed up model training has become a research hotspot.Considering the above problems,this thesis has done the following three aspects on the basis of using deep learning to solve the task of Chinese named entity recognition:First,the SH-BiLSTM-CRF Chinese named entity recognition model is constructed.The model uses words as the input unit of the model,which avoids the noise caused by possible word segmentation errors.At the same time,the model uses an external dictionary to introduce lexical-level feature information in the input layer,and uses the Highway network to dynamically combine word-level and lexical-level features.The dimensional text information enriches the features of the input text,and the model also uses conditional random fields to learn constraints between labels to improve the performance of named entity recognition.Experiments on three datasets show that the model has better performance in Chinese named entity recognition,and the accuracy,recall,and F1 value of model have been improved compared to the baseline model.Second,using the idea of multi-task learning,the Chinese named entity recognition task is divided into two sub-tasks: identifying entity types and identifying entity locations,and multi-task transformation of the SH-BiLSTM-CRF model,which reduces the computational cost of the conditional random field loss function ultimately reduces the training time of the model.Third,based on the multi-task transformation of the SH-BiLSTM-CRF model,the Muil-Ref-BiLSTM-CRF model is constructed,which uses an improved cross-entropy loss function.Different types of recognition errors add different penalty weights to reduce the proportion of model accuracy and recall,and improve the F1 value of model recognition.In the experiments on two datasets with more entity categories,the training time of this model is shorter than SH-BiLSTM-CRF model,and the F1 value of the model recognition is higher.
Keywords/Search Tags:Chinese named entity recognition, vocabulary enhancement, multi-task learning, conditional random field
PDF Full Text Request
Related items