| Explaining natural phenomena is one of the most important purposes of science.In scientific research,scientists are committed to providing reasonable explanations for natural phenomena.In science education,educators are also committed to making students have the ability to explain natural and life phenomena.Scientific interpretation has been regarded as a necessary practical ability for students by educational policy documents all over the world.As a complex ability,scientific interpretation is difficult to be evaluated by closed questions such as multiple-choice questions and true-false questions,and often depends on open test questions.However,the open test questions need to consume a lot of human and time resources in the evaluation process,which can not be timely feedback,and also leads to teachers’ reluctance to use them in daily teaching.Machine learning is a new technology in the computer field,which has the ability to process a large amount of data to find hidden patterns,and can be used for the analysis and prediction of educational big data.At present,accelerating the digital transformation of education has become an important part of the construction of "Digital China".Automatic evaluation is the core content in the digital transformation of education,and it is the basis of intelligence,accuracy and personalization of education and teaching.In recent years,many foreign researchers have used machine learning in the field of educational evaluation and proved its effectiveness and feasibility,while domestic research is in its infancy.Different languages have their own structural and grammatical features.Although these conclusions in English context give some enlightenment,they cannot be directly transferred to Chinese context.Therefore,on the basis of existing research at home and abroad,this study is devoted to obtaining the method of automatically grading students’ scientific interpretation ability by using machine learning in Chinese context,exploring the feasibility of the method,and putting forward relevant countermeasures and suggestions to improve the accuracy of grading according to the results.Firstly,based on existing literature,a method for automatically scoring open test questions using machine learning was summarized;And the feasibility and effectiveness of this method were empirically verified by scoring the test questions and student responses in the scientific explanation question bank using this method.Secondly,data exploration was conducted from the perspectives of sample characteristics,test question types,and scoring rules,focusing on "What factors affect the accuracy of machine learning scoring.The results indicate that in terms of sample characteristics,as the sample size increases,the accuracy and human-machine consistency of machine scoring improve,and the minimum sample threshold required for different test questions is different;In terms of test question types,questions involving multiple theories and multiple explanatory paths will bring more difficulties to machine automatic grading,and the accuracy of grading is relatively low;In terms of scoring rules,the machine performs slightly better in analytical scoring rules than in overall scoring rules,and slightly better in binary scoring than in multi-level scoring systems.Finally,based on exploratory analysis of influencing factors and existing literature,strategies to improve scoring accuracy from the perspective of teaching applications were proposed.In terms of sample characteristics,the sample size used for training the scoring model should be at least a thousand levels,and ensure that there are enough samples at each score level;In terms of test question types,the description of the questions should be clear and easy to understand,involving as much as possible a single theory and causal chain.If multiple theories and causal chains are involved,the sample size should be increased;In terms of scoring rules,analytical scoring rules and binary scoring should be adopted as much as possible.If the test questions involve multiple theories and multiple interpretation paths,in order to reduce the difficulties encountered in machine automatic scoring,the scoring rules can be further refined and the multi-level scoring system can be transformed into multi-level binary scoring. |