Font Size: a A A

Research On Chinese Named Entity Recognition Based On Deep Learning

Posted on:2022-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:M YuanFull Text:PDF
GTID:2518306524975659Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Named entity recognition is a basic task in natural language processing technology.As a key step and an important means to transform unstructured text data,named entity recognition is closely related to information extraction,question answering system,and text classification.Although the development of named entity recognition technology is gradually improving,there are still some problems in the task of Chinese named entity recognition:(1)It is difficult to obtain data sets,and there is no unified and standardized processing template;(2)The preprocessing stage considers insufficient word characteristics,Ignore the semantic information of the context;(3)Nested entities can't be identified effectively.Based on these,it's difficult to improve the performance of Chinese named entity recognition.Taking into account the current status of Chinese named entity recognition,this article mainly does the following work:(1)Acquisition and processing of data sets.Due to data privacy and other reasons,it is more difficult to obtain data sets.In addition,the text data in the Internet environment has problems such as poor standardization and structure confusion.Therefore,how to perform subsequent processing on the acquired data set has also become a problem that needs to be solved.From these perspectives,this article performs the following processing after obtaining the data set: firstly,a unified BIOES annotation strategy is adopted,and secondly,the excessively long sentences in the data set are segmented twice.In addition,the nested naming in the data set is also used.The entities are properly divided.(2)Constructed the FB-Lattice-LSTM-Att+CRF Chinese named entity recognition model applied to the general field.The model fully considers the integration of vocabulary information in the character embedding layer,and at the same time assigns weights to the word segmentation results.The model also introduces a highway model to solve the problem of gradient disappearance caused by too many neural network layers.In addition,the model introduces an attention mechanism to correlate context information to better predict entity tags and improve the overall performance of the model.Experimental results show that the model has achieved good results on the MSRA corpus,Chinese resume data set,Weibo data set and literature data set,with F1 values of 93.75%,95.71%,59.81% and 74.52%,respectively.(3)By analyzing the experimental results of the FB-Lattice-LSTM-Att+CRF model,we found that the model has a good performance in the specific domain data set with clear entities.Based on this,we construct the FB-Lattice-Transformer+Multi-CRF model and apply our model in the medical field.Not only the model has the FB-Lattice-LSTMAtt+CRF model's advantages,but it also uses the word vector training of topic information to represent characters and words,model uses the transformer model for parallel computing to improve the training speed.In addition,it also introduces levels Labeling and hierarchical models are used to solve the recognition of nested named entities.The model has achieved good experimental results on the selected electronic medical record data set.Compared with the entity recognition method in the traditional medical field and the FB-Lattice-LSTM-Att+CRF model,the F1 value has increased by1.19% and 0.51%,respectively.
Keywords/Search Tags:Chinese named entity recognition, attention mechanism, transformer model, electronic medical record
PDF Full Text Request
Related items