Font Size: a A A

Research On Complex Entity Recognition And Class Increment Problem In Named Entity Recognition

Posted on:2024-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z D TanFull Text:PDF
GTID:2568307067493124Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named Entity Recognition(NER)system aims to identify entities of interest in text,such as locations,organizations and time.NER is the foundation of many natural language processing tasks,and recognized entities can be directly used in various downstream ap-plications or indirectly serve other NLP tasks as an intermediate task.Currently,there have been many research achievements regarding NER systems,but there are still some shortcomings in practical application scenarios.For example,in the field of bioinformatics,texts usually contain nested and discontinuous entities,which can-not be well solved by traditional sequence labeling NER models.On the other hand,the types of entities that people are interested in are constantly changing,so the NER system should also be able to incrementally update the types of entities it can recognize to meet people’s changing demands.In addition,models trained on datasets containing noisy samples have a serious im-pact on the model’s generalization ability.Therefore,how to automatically obtain a clean dataset has important practical significance in the industry.To address the above issues,this thesis conducted the following research:(1)Firstly,we studied how to effectively extract nested and discontinuous com-plex entities from unstructured text.We proposed a Prompt Enhanced Generative Ma-chine Reading Comprehension Framework(PGMRC)for NER,which is based on prompt-enhanced generative machine reading comprehension.Specifically,we converted the NER task into a machine reading comprehension task and used the pre-trained language model BART to query according to different entity types to generate corresponding en-tity span sequences.Finally,we used continuous prompts to enhance discrete queries to improve the model’s robustness.We conducted extensive experiments on the benchmark datasets GENIA,ACE04,ACE05,and our proposed PAN dataset and achieved the best experimental results.(2)To further improve the practicality of the NER system,we proposed a two-stage NER category incremental learning model,which divides NER into entity span detection and entity span classification in a pipeline form.In order to retain the previously learned knowledge of the model for old entities,we used the knowledge distillation framework.The student model learned new entity types through new training data and retained knowl-edge of old entities by imitating the teacher model’s outputs on this new training set.Our experiments show that this method allows the student model to gradually learn to recog-nize new entity types without forgetting the previously learned entity types.(3)With the help of big data,deep learning has achieved significant success in many fields.However,due to noisy labels seriously reducing the generalization performance of deep neural networks,this thesis proposes a noise sample selection method based on NER,which can filter the dataset through the training information of each sample during the model training process to obtain a clean dataset.
Keywords/Search Tags:Natural language processing, Named entity recognition, Nested and discontinuous entities, Incremental learning, Noise mining
PDF Full Text Request
Related items