| With the rapid development of the Internet,Massive Open Online Courses(MOOC)have gradually become a mainstream way of online learning worldwide.However,most students dropout before completing the course,resulting in a persistently high MOOC dropout rate.The extremely high dropout rate seriously affects the development of MOOCs.To address this issue,predicting students’ withdrawal behavior and intervening to remind them is crucial.MOOC dropout prediction enables the platform to promptly intervene and guide students,formulate more rational learning plans and objectives,and facilitate better academic progress.Additionally,MOOC dropout prediction offers timely feedback to teachers,allowing them to adjust their teaching strategies and content,and enhance the quality and appeal of the course.The current research on the dropout prediction problem mainly involves analyzing the behavior data of students while they are learning courses.However,the current research has the following problems: Firstly,the word vector model cannot distinguish between course text vectors;Secondly,the potential difference in course difficulty can interfere with the dropout prediction results;Finally,a single machine learning model focuses on the analysis of learning performance in online courses while ignoring the staged student behavior,and has low accuracy and poor generalization ability when dealing with high-dimensional dropout features.In view of the above problems,this thesis proposes a student dropout prediction method in MOOC learning.The main contributions are as follows:(1)Attention-based doc2 vec model(A-Doc2vec): The A-Doc2 vec model is proposed to automatically learn the course and video sequences watched by students,extract sequence attribute features,and introduce an attention mechanism to calculate the weight vector of the original feature vectors to locate the specific position of attribute features.This model further enables the extraction of attribute differences and relationships between courses.Compared to existing word vector models like Doc2 vec,A-Doc2 vec can not only parse the semantic information of courses and video sequences,but also capture the relational features between them.(2)Feature learning of course difficulty based on stacking meta-learning strategy: A feature learning method is proposed to dynamically represent the course difficulty.The potential difference in course difficulty interferes with the dropout prediction results.To weaken this effect,the feature of course difficulty is calculated based on the two-layer stacking architecture.In the first layer,high-dimensional features are represented as low-dimensional features by nonlinear operations,and in the second layer,meta-learners represent low-dimensional features as course difficulty feature of one-dimensional by linear operations.The hierarchical feature learning method can effectively improve the representation ability of features.(3)Weighted soft voting ensemble with heterogeneous classifiers(WSV-HC): A weighted soft voting heterogeneous classification model is proposed,which integrates Boosting models(XGBoost,Light GBM and Cat Boost),Bagging models(Random Forest),CNN and LSTM.Firstly,the localized learning behavior features and time series features are extracted by CNN and LSTM models.Secondly,the heterogeneous integrated Boosting model and Bagging model are used to maximize the integration effect of the weighted soft voting,and further improve the accuracy of the dropout prediction of WSV-HC.In summary,the dropout prediction method proposed in this thesis makes full use of the learner’s current time series information to predict the state of the learner’s subsequent learning process,and improves the accuracy of dropout prediction by optimizing both feature extraction and classification model dimensions. |