| Relation extraction is an important task in natural language processing,which aims to identify the semantic relationship between two entities in a given text.In recent years,relation extraction techniques have flourished and become the basis for many advanced natural language processing tasks,and are widely used in many fields such as knowledge graphs,text summa-rization,and information retrieval.Therefore,the quality of relation extraction models is crucial for relevant downstream applications,and its quality has also attracted much attention.However,quality assessment of relation extraction models faces many challenges.On the one hand,existing evaluation methods based on test datasets rely on manually annotated ground truth,thus often requiring high testing costs.At the same time,the annotation quality of datasets can strongly affect the model performance,and low-quality annotations will reduce the effectiveness of quality assessment.On the other hand,dataset-based evaluation methods only report the accuracy of the model relative to a single dataset,and cannot evaluate the capabilities and characteristics of different aspects of the model in detail.In order to solve the above problems,this paper proposes a quality assessment method for relation extraction models based on metamorphic testing.Our approach eliminates the need for human annotation of ground truth information,and supports the evaluation and understanding of the capabilities of different aspects of relation extraction models.In this paper,we define12 metamorphic relations according to the characteristics of relation extraction models.These metamorphic relations mainly focus on their capabilities and characteristics in four aspects: 1)the capability of the model to understand and handle entities? 2)the performance of the model for gender fairness? 3)the capability of the model to identify the direction of relationships?4)the performance of the model to handle overlapping entity relation extraction.Based on this,we implemented an automatic metamorphic testing framework,and carried out sufficient experimental research.Experimental results show that there are a large number of prediction failures in the tested relation extraction models(a total of 37976 violations of metamorphic relations were detected).Further analysis of the experimental results reveals some common characteristics and potential problems of the relation extraction models.These results show that metamorphic testing techniques can effectively evaluate the capabilities and characteristics of different aspects of relation extraction models,and the whole quality assessment process no longer relies on ground truth information.The main contributions of this paper are summarized as follows:(1)We propose to apply metamorphic testing techniques to evaluate relation extraction models.According to its different capabilities and characteristics,12 metamorphic relations are defined,and an automatic framework for quality evaluation of relation extraction models is realized with the help of natural language processing tools such as Stanford NER,Geopy API,Neural Coref,and Bias Finder.(2)We conduct extensive experiments on three mainstream and advanced relation extrac-tion models.The BERTEM+MTB,LUKE,and NCB models were selected as experimental ob-jects,demonstrating the feasibility and effectiveness of metamorphic testing in evaluating rela-tion extraction models.(3)We perform a full experimental analysis.The overall performance of different relation extraction models is summarized,and the advantages and disadvantages of each model in differ-ent metamorphic relations are analyzed.In addition,the experimental results effectively reveal the typical problems of the relation extraction model,such as over-reliance on entity type infor-mation,which in turn provides guidance and reference for model understanding,improvement and repair. |