| Diversification of teaching modes in recent years has promoted the technology-enhanced learning, such as Massive Open Online Course(MOOC), Small Private Online Course(SPOC), blended learning, flipped class, etc. These changes not only enable students to learn online out of class, but also enable them to do homework and take test online. It is true that computer network technologies can automatically process a large amount of data resulted from students’ learning, but most exercises that require subjective responses from students involve human work, and bring about huge workload for the teachers. Short of a solution, the exercise types would be limited.Therefore, it is the probing project of this research to find out whether it is possible to borrow data-mining technologies to establish a machine scoring model and evaluate the quality of subjective exercises by automatic scoring.In this paper, the data-mining technology is introduced into automatic scoring of short answer questions in online English listening, to probe into the feasibility of applying data-mining technologies to subjective questions automatic scoring. The data came from the student response to short answer questions of online listening, embedded in the online learning system for the course of “Advanced English Reading and Writingâ€, which adopts flipping class teaching mode in a domestic university. There were three SAQs for one unit. Each SAQ has a total of 650 answers, 200 of which were selected randomly as samples, and so a total of 600 samples from three SAQs. In order to obtain the match percentage between machine scoring and human scoring, an expert scoring sheet was designed. The experts needed to score students’ answer according to the reference answer, mark the keywords of each question and evaluation standards.Then, the scores of the experts were assigned to Weka, a machine learning software, for clustering analysis so as to find answer types for each cluster. Next, the weight of the keywords was assigned and calculated. A decision tree was produced by Weka to displays the results. A machine scoring model for SAQs was thus established. Finally,students’ answers were scored by the model, and the results were compared with those of the expert scoring. The whole process is aimed to get the conclusion that data mining is feasible in scoring subjective questions.Research results show that the rater reliability and inter rater reliability are high,indicating that the expert evaluation results are reliable. Clustering analysis can effectively cluster the students’ answers, and the number of clusters can be set according to the research needs. The decision tree can build scoring rules with help of human scoring, and can be utilized to automatically score the online listening SAQs. It is found,by comparing the results of data mining-based scoring and expert scoring, that the accuracy of automatic scoring on high level answers is the highest, followed by medium level answers and low level answers. The cause for the difference is that the top roots(i.e. the key words with the maximum weight) of high level answers can be more easily recognized by the scoring rules, and they are complete in meaning, while the medium and the low level answers either are short of the key words or hardly present complete meaning. However, the trends of expert scoring and automatic socring are consistent.Above research results fill the gap of researches on automatic grading the English subjective questions to some extent, and this research provides an important reference for the future study in the field. |