| As an important means of selecting talents and measuring educational standards,the national education examination emphasizes its authority and fairness.For national examinations,the quality of their questions is not only related to whether they can effectively judge the candidates’ ability and quality,but also related to the fairness of the society,among which,ensuring that the test questions do not duplicate those that appeared in the past is a prerequisite to ensure the quality of the test questions.However,with the development of education and the Internet,the number of test questions including those from schools at all levels,out-of-school training institutions and various teaching materials has increased year by year,and the existing large number of test questions plus the new ones appearing every year have led to a large number of test questions in each subject,which is too difficult and inefficient to rely on manual checking.Therefore,the establishment of an effective test question checking system to avoid duplication of test questions and thus ensure the authority and fairness of the national examinations has become an urgent problem to be solved by the education examination department in China.In this paper,we evaluate the methods of calculating test similarity for the problem of test question checking in national education examination propositions,and address the following problems in the current test question checking task: firstly,there are duplicate test questions,but their textual representations differ greatly;secondly,there are large differences between test questions and ordinary texts,which contain not only textual content but also multimodal information such as images;finally,how to embed the semantic embedding of test questions representation is combined with the task of checking the weight of national examination questions.Therefore,the knowledge point information is incorporated into the semantic embedding representation model of the test questions for the existing methods that are difficult to retrieve the repetitive test questions with different representations.For the multimodal information contained in the test questions,a multimodal-based semantic embedding representation model for test questions is proposed.Finally,the semantic embedding representation model of test questions is applied to the task of checking the weight of national examination questions.In summary,the main research contents and contributions of this paper are:1.Repeated test questions in physics subjects were collected and annotation specifications were established by analyzing the relationship between similar test questions and knowledge points.A duplicate test question dataset was constructed by using a combination of automatic annotation and manual correction for training a semantic embedding representation model for test questions.1.a semantic embedding representation model of test questions fused with knowledge point information is proposed.The model uses a double encoder structure to extract the semantic information of the test questions,obtains the semantic embedding representation of the test questions by mean pooling,and portrays the similarity of the test questions by cosine similarity.2.A multimodal-based semantic embedding representation model of the test questions is proposed,which uses the multimodal data in the test questions to learn the semantic information of the test questions.Firstly,we use convolutional neural network to learn the feature representation of the corresponding image of the test question,then we use pre-trained language model BERT to obtain the semantic representation of text and knowledge points,and finally we use Transformer to fuse the semantic information of the text of the test question,the corresponding image of the test question,and the knowledge points,and finally we get the final semantic representation of the test question.3.Repeated test questions in physics subjects were collected and annotation specifications were established by analyzing the relationship between similar test questions and knowledge points.A duplicate test question dataset was constructed by using a combination of automatic annotation and manual correction for training a semantic embedding representation model for test questions.Finally,the effectiveness of the semantic embedding representation model of the test questions proposed in this paper is verified by conducting experiments on the manually annotated test question corpus. |