Font Size: a A A

Research On Information Extraction With Complex Entity

Posted on:2021-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:H L XuFull Text:PDF
GTID:2428330605474882Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Information extraction aims to extract structured information from free text,mainly including named entity recognition(NER),relation extraction(RE)and event extraction(VE)etc.The research of complex entities(such as nested entities and discontinuous entities)has received more and more attention.A lot of work is focused on the study of nested entities,and there are few studies on discontinuous entities.In view of this,this thesis conducts research on information extraction of complex entities and their relationships with main points as follows:First,for the recognition of Chinese nested named entities,the thesis proposes a nested named entity recognition method that combines self-attention mechanism and joint learning paradigm.The method first uses a multi-layer bidirectional LSTM network to encode sentences.Then,an entity fusion method based on self-attention mechanism is adopted between the bidirectional LSTM layers to effectively transfer the entity semantic information obtained by the lower LSTM layer to the upper ones.Experimental results on the Chinese nested named entity corpus demonstrate that the method can effectively capture nested entities ocurring in the free text.Second,inspired by the multi-layer sequence labeling model,the thesis proposes a discontinuous entity extraction method based on the idea of virtual nested entities(VNE).The method converts the discontinuous entities that were difficult to be recognized by sequence labeling into virtual nested entities.Then,the recognition of virtual nested entities is integrated into the nested entity recognition model via a multi-layer labeling strategy.Finally,discontinuous entities and other entities are restored based on the multi-layer labels predicted by the model.Experimental results on the GENIA corpus show that the method can in some degree identify discontinuous entities,thus laying the foundation for a comprehensive identification system that can recognize all kinds of entities.Third,for relation extraction in nested named entities,the thesis proposes a method based on pre-trained language models.According to the structure of Chinese nested named entities,different placeholders are used to render the hierarchy between nested entities.Then,the fine-tuned BERT model is used to capture the semantic information implied by the sequence,while the CNN model is used to capture the most important part of the sequence.Experimental results verify that the model has achieved promising results in the task of Chinese nested entity relationship extraction.
Keywords/Search Tags:Complex Entity Recognition, Nested Named Entities, Discontinuous Entities, Pre-trained Model, Relation Extraction
PDF Full Text Request
Related items