Research On Complex Entity Recognition And Class Increment Problem In Named Entity Recognition

Posted on:2024-01-21

Degree:Master

Type:Thesis

Country:China

Candidate:Z D Tan

Full Text:PDF

GTID:2568307067493124

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Named Entity Recognition(NER)system aims to identify entities of interest in text,such as locations,organizations and time.NER is the foundation of many natural language processing tasks,and recognized entities can be directly used in various downstream ap-plications or indirectly serve other NLP tasks as an intermediate task.Currently,there have been many research achievements regarding NER systems,but there are still some shortcomings in practical application scenarios.For example,in the field of bioinformatics,texts usually contain nested and discontinuous entities,which can-not be well solved by traditional sequence labeling NER models.On the other hand,the types of entities that people are interested in are constantly changing,so the NER system should also be able to incrementally update the types of entities it can recognize to meet people’s changing demands.In addition,models trained on datasets containing noisy samples have a serious im-pact on the model’s generalization ability.Therefore,how to automatically obtain a clean dataset has important practical significance in the industry.To address the above issues,this thesis conducted the following research:(1)Firstly,we studied how to effectively extract nested and discontinuous com-plex entities from unstructured text.We proposed a Prompt Enhanced Generative Ma-chine Reading Comprehension Framework(PGMRC)for NER,which is based on prompt-enhanced generative machine reading comprehension.Specifically,we converted the NER task into a machine reading comprehension task and used the pre-trained language model BART to query according to different entity types to generate corresponding en-tity span sequences.Finally,we used continuous prompts to enhance discrete queries to improve the model’s robustness.We conducted extensive experiments on the benchmark datasets GENIA,ACE04,ACE05,and our proposed PAN dataset and achieved the best experimental results.(2)To further improve the practicality of the NER system,we proposed a two-stage NER category incremental learning model,which divides NER into entity span detection and entity span classification in a pipeline form.In order to retain the previously learned knowledge of the model for old entities,we used the knowledge distillation framework.The student model learned new entity types through new training data and retained knowl-edge of old entities by imitating the teacher model’s outputs on this new training set.Our experiments show that this method allows the student model to gradually learn to recog-nize new entity types without forgetting the previously learned entity types.(3)With the help of big data,deep learning has achieved significant success in many fields.However,due to noisy labels seriously reducing the generalization performance of deep neural networks,this thesis proposes a noise sample selection method based on NER,which can filter the dataset through the training information of each sample during the model training process to obtain a clean dataset.

Keywords/Search Tags:

Natural language processing, Named entity recognition, Nested and discontinuous entities, Incremental learning, Noise mining

PDF Full Text Request

Related items

1	Research On Nested Named Entity Recognition Algorithm Based On Deep Learning
2	Research On Information Extraction With Complex Entity
3	Research On Nested Named Entity Recognition Method Combining Boundary Detection And Span Classificatio
4	Research On End-to-End Nested Named Entity Recognition Metho
5	Research And Implementation Of Mining Bilingual Named Entities From Large-Scale Web Pages
6	Research On Named Entity Recognition Method And Application Based On Transformer
7	Research And System Realization Of Recognition And Normalization Of Scientific Research Entities
8	Domain Adaptation Research And Application Of Named Entity Recognition
9	Research On Flat Nested Named Entity Recognition Metho
10	Research And Application Of Domain Oriented Entities And Inter-entity Relations