With the advent of the big data era,a large amount of unstructured textual data is generated every day,containing valuable information.Faced with massive data,information extraction technology can automatically extract valuable structured data from unstructured text to better serve people.Entity and relation extraction,as a core task in information extraction,models text information and automatically extracts named entities in the text while identifying semantic relationships between entities.It forms a triplet structure of <header entity,relation,tail entity>.The extracted triplets serve as an important foundation for automatic construction of knowledge graphs and can be applied in fields such as automatic question answering,search engines,and recommendation systems.In recent years,deep learning has developed rapidly.It has the powerful capabilities of parameter learning and feature extraction,which make up for the shortcomings of traditional machine learning algorithms and artificial features.Therefore,it is more conducive to the construction of entity and relation extraction model.However,the current models still have many challenges: 1)The existing works adopt the sequential labeling model to assign unique label to each word,which cannot handle the multi-label problem of identifying objects;2)The existing works adopt pipeline approach,which not only have error propagation problems but also ignore the interaction between entity recognition and relation extraction tasks;3)The existing works cannot effectively utilize objects that partially match with real instances.Due to the similar semantic features,these objects are prone to false detection and missing detection,thereby affecting the performance of the model.4)The existing works utilize contextual features for classification.When the identified objects overlap in the text,they are indistinguishable in the text due to shared contextual features.In entity and relation extraction task,named entities have unique start and end boundaries.Additionally,boundary recognition has characteristics such as small granularity,not depending on other NLP tasks,small ambiguity,and relying more on local features.In deep learning-based entity and relation extraction models,entities are defined as abstract semantic representations with boundary parameters,which can uniquely identify entities and relations in text and distinguish semantics that overlap with each other in a sentence.In order to address the above challenges,this paper focuses on entity and relation extraction task based on entity boundary,and carries out a series of studies from the aspects of identifying multi-labeled objects,modeling the interaction between entities and relations,effectively using of partially matching objects,and solving overlapping contextual features.The main contributions of this paper include the following aspects:(1)To handle the multi-label problem of recognized objects,an entity and relation extraction model based on boundary determination(BD)is proposed.The model adopts a pipeline approach to locate candidate entities by using entity boundaries,supporting overlapping semantic recognition.Then entity boundaries and related information are combines into features,which fully utilizes the data information to learn key semantic and structural information in the text.Experiments show that the performance of BD model exceeds all models in the comparison experiment,which verifies that entity boundary is very important for entity and relation extraction task.(2)To model the interaction between entities and relations,a joint entity and relation extraction model based on boundary assembling(BA)is proposed on the basis of BD model.The model designs three shared tasks: boundary detection,span classification,and relation extraction.During the training process,entity boundaries are used as supervised information,and model parameters are shared among multiple tasks to alleviate the problem of error propagation and insufficient task interaction in the pipeline approach.This improves the overall performance of the joint extraction model.Experiment shows that BA model outperforms the comparative models on four public datasets.(3)To effectively utilize partially matching objects,a joint entity and relation extraction model based on boundary regression(BR)is proposed.A multi-objective learning framework is constructed to predict category labels and locate the position of entities in the text.Boundaries are adjusted for inaccurate candidate entities,which can learn more accurate text representation and reduce the influence of false positives.Experiments show that BR model outperforms the comparative models on four public datasets,verifying the effectiveness of boundary regression mechanism for extraction tasks.(4)To solve the problem of overlapped context features,a joint entity and relation extraction model based on multi-scale boundary aggregation(MSBA)is proposed.A cross-encoding is adopted to map every entity pair into a two-dimensional entity table,which encodes relation-relevant entity representations.It enables the same entity to learn different semantic representations for overlapped relation triples.Furthermore,multi-scale feature refinement module is proposed to make full use of the entity table.It has the ability to aggregate multi-scale features and reinforce semantic correlations in the entity table.Experiments show that MSBA model outperforms the existing model on the two public datasets,which can face the complex language scenarios effectively. |