With increasing research in the field of natural language processing,the task of information extraction has become an important subfield.Information extraction is to extract meaningful target information from large amounts of textual data in an efficient manner,and to further improve the efficiency of the use of information.With the increasingly widespread use of Internet technology,the number of scientific literature releases is growing geometrically.Scientific research results are mostly saved in the form of unstructured electronic documents in numerous scientific documents.Many new tasks,methods and data sets are constantly being proposed and the work of comparing different methods in different scientific research results is labour-intensive.Therefore,the natural language processing for large-scale,automated structured information extraction from the scientific literature can help researchers to efficiently understand the research content and research hotspots in the specific field.Traditional information extraction methods for scientific documents usually adopt a pipeline approach,where multiple information extraction tasks are divided into separate subtasks,and different models need to be trained for each subtask separately.The approach breaks the link between the subtasks,and misses dependencies between the subtasks.Furthermore,errors in the previous task are passed on to the subsequent tasks,resulting in poor overall model performance.Moreover,scientific documents are generally long texts.The document size increases more bigger,it becomes more difficult for extraction methods to capture cross-entity dependencies between entity types due to the long distances between entities.In deep learning,the training of models often requires a large amount of labelled data,and in the general domain,large-scale training data can usually be obtained through crowdsourcing.But in the scientific domain,collecting annotated data is complex and time-consuming.The expertise required for high-quality annotation and data-efficient models for information extraction from scientific documents.To address these issues,this paper proposes a joint extraction model for entity relations based on linear structure generation.The method uses a uniform linear structure encoding to represent entity and relationship information and guides the model to generate the target information by adding hint templates to the input text.In order to generate valid structures,the model is pre-trained on a large silver standard dataset for structure mapping,and a replication mechanism is integrated that can replicate key information in the source text to help solve the OOV problem in scientific information extraction and enhance the ability to identify key information in the input document.The paper also explores the low priority task of multivariate relationship extraction,which is important for extracting information from scientific documents.In order to improve the performance of the model,an efficient training method is designed based on previous work,and the best experimental results are obtained so far.To verify the effectiveness of the model,the paper is validated on Sci ERC and Sci REX datasets consisting of scientific literature.The results show that our model is a good scientific information extraction model,with significant advantages in the relationship extraction task,and especially better performance in complex multivariate relationship extraction tasks.The effectiveness of this model in low-resource information extraction tasks such as scientific literature is demonstrated by comparing it with other models under low-resource conditions... |