| The globalization of information technology is underway in today’s society,and the informatization of education is steadily advancing.Online learning is an important means to realize education informatization,and lays the foundation for the development of personalized education.This provides users with a large number of learning resources and a variety of online teaching services,breaks the time and space limitations of traditional lectures.The online education system is the main platform for achieving online education.It records a large amount of information on users’ online behavior data,which provides a basis for further research on users’ online learning behavioral characteristics and promotes development of online learning.This paper is a study on a large educational dataset,Ed Net,collected from a tutorial system Santa,which provides online teaching service for students preparing for the TOEIC exam.Based on the Ed Net,this paper mainly does the following work.First,abundant of data pre-processing work has been done on the Ed Net.Although Ed Net has a large amount of records,the information of these records is too scattered and lack of aggregate information extraction.Therefore,a list of data pre-processing work such as data cleaning and feature extraction were carried out on the dataset in the early stage,and information such as the right or wrong answers and difficulty of questions were extracted for subsequent work convenience.Second,on the pre-processed data,this paper investigated the questions that students answered.It was found that difficulty of the questions had a significant impact on students’ learning outcomes.Students answering questions that exceed their knowledge level of difficulty may be detrimental to their final performance.The correlation between different levels of question difficulty on students’ learning effectiveness was also analyzed on this paper and some suggestions were given to help using the online education platform more efficiently.Third,this paper further extracts features based on the pre-processed data and constructs three data tables of students’ behaviors information consisting of question difficulty,elapsed time by answering,right or wrong of students’ answers,and correlation information between questions.On these three data tables,this paper conducted a prediction task of students’ answer performance using a machine learning model and analyzed the effects of several extracted learning behavior features on experimental results.Then,this paper constructs an answer prediction model LSTM_43 based on long short-term memory network,which was used for the prediction task as well.The second model,LSTM_44,is proposedly improved by adding coarse-grained and finegrained LSTM models horizontally to LSTM_43 in order to focus on student’s study performance in future short term and long term.Parametric analysis experiments were conducted to explore the effect of different number of hidden layer units on the prediction results as well.Finally,in order to extend perspective of research based on the prediction task of answering one question at next moment,this paper expands the number of answering questions to ten,and proposes the knowledge tracking models LSTM_53 and LSTM_54 that examine students’ future learning performance in terms of passing rate.The experimental results show that the knowledge tracking models combining coarse and fine-grained long short-term memory networks have better prediction effects.The effect of different pass rate settings on the accuracy of the model is explored and analyzed in the experiment as well,which presented with some subsequent thoughts and extensions worthy of research. |