| In natural language processing,named entity recognition,as a fundamental task in information extraction,can provide basic support for many downstream tasks and help people to uncover valuable information contained in text.However,in many texts there are a large number of nested entities,which are difficult to recognise using traditional sequence annotation methods.Currently,nested named entities are often recognised using span-based classification,but there are two main problems with this:firstly,the need to enumerate and classify all spans results in high computational complexity and data imbalance.Second,as each span is treated as a separate unit for classification,the dependencies between spans are ignored.Based on the idea of making full use of boundary semantic information to better identify nested named entities,this paper proposes a nested named entity identification method that combines boundary detection and span classification.The research work is divided into two parts as follows:(1)Joint boundary and entity recognition based on span: The method simultaneously detects entity boundaries and identifies entity spans,introducing boundary information to reduce computational complexity and alleviate the data imbalance problem.The model detects word gaps by means of a classifier with a biaffine mechanism,forming high-quality candidate spans by combining the detected boundaries and filtering a large number of negative cases.In addition,the model is trained by a neighbourhood span negative sample generation strategy to build a perfect and moderate number of training samples.Based on this,a multi-objective learning framework combining entity boundary detection and entity span classification is designed to facilitate the interaction of the two subtasks and improve performance by sharing parameters and optimising them simultaneously.Experiments on a public dataset demonstrate the advanced performance of the proposed approach.(2)Nested named entity recognition based on span decoding: The method identifies nested entity spans by coding and decoding,and can effectively learn interentity span dependencies.The word representation of the sentence is first obtained by fusing multiple features of the text through the encoder.The start boundary of the entity is then detected to locate the entity and generate candidate entity spans at the location of possible entity start boundaries.The decoding order of the span is determined based on the position of the start boundary.Finally,the span information of the entities and the learned entity dependencies are passed through a decoder based on an attention mechanism.The experimental results show that the proposed method can effectively capture the inter-entity dependencies and improve the traditional span classification methods. |