Research On Dropout Prediction Based On MOOC Log Mining

Posted on:2018-07-01

Degree:Master

Type:Thesis

Country:China

Candidate:A D Li

Full Text:PDF

GTID:2348330518996834

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Since MOOC was widely popularized in 2012, people’s way of getting knowledge has been greatly expanded. However, due to its openness,despite the large number of users, the MOOC dropout rate has maintained high, even up to 90%, which makes the MOOC dropout prediction a difficult but valuable topic. This study on this topic aims to help teachers get to know the realtime situation of their courses so to change their teaching strategy if it is necessary. Besides, the computer model of learner behavior provides MOOC platform with valuable learner information. The precise prediction on dropout will be beneficial to supervising MOOC user’s state. The realization of model on MOOC user will reduce the cost of manpower greatly and benefit the development of MOOC platform.Based on the competition data of KDD CUP 2015 and existing work during the competition, this thesis conducts further work on feature engineering, classifier training and prediction, ensemble learning, and the application of deep learning algorithms. A new series of powerful feature is extracted and improvement of gradient boosting decision tree are applied.After the improvement, the model’s AUC of ROC increases from 0.887 to 0.9014. There is only a slight difference of 0.6% between the first rank in KDD CUP 2015 and the final performance of this thesis’s model, which is roughly ranked tenth in all 821 teams.The main contributions of this thesis are the followings:(1) Conduct study on key aspects of feature engineering in MOOC dropout prediction. The thesis explores all kinds of features in detail and lists not only effective features, also bad features and the corresponding reasons. Besides, qualitative analysis of the failure of χ2 test and F-Score in determining threshold of feature is conducted.(2) Propose a new model fusion method called Ada-Gradient Boosting.This method not only shuts the step of artificial segmentation of training data, to avoid overfitting, during the model fusion, but also avoids the tedious steps of the need for artificial collection of base classifiers’prediction to train the integrated model again. With the advantage of AdaBoost algorithm, all the training processes can be automated and improve the data utilization rate. The best performance of ensemble model is thus obtained.(3) A novel loss function for forward addition model is-propesed. In the process of trying to explain the effect of the intergrated Ada-Gradient Boosting model, borrowing multi-task learning thoughts, the thesis creatively puts forward a new combination loss function (Combination Loss). This loss function enables the best performance of a single model.(4) New features are put forward in order to keep the information of user logs as complete as possible. To provide the conditions in which the application of deep learning algorithms are applied to this topic, the thesis reextracts the quantitative features of the user logs and keeps the information loss as least as possible.

Keywords/Search Tags:

feature engineering, threshold, classification, ensemble learning, loss function

PDF Full Text Request

Related items

1	Research On Application Of Deep Convolutional Neural Network Models For Feature Extraction And Classification
2	Design Of Security Text Classification System Based On Ensemble Algorithm
3	Application of Engineering Principles with a Comparison of Machine Learning Classification Methods to Predict Treatment Outcomes in Head and Neck Cancer Patients
4	Image Feature Extraction And Classification Research Based On Ensemble Learning Model
5	Research On Loss Function Of Convolutional Neural Network For Image Classification
6	The Research On Small Sample Image Classification Method Based On Deep Learning
7	Research On Credit Evaluation Of Feature Engineering Based On Ensemble Learning
8	Research On Prediction Of Top-quality Tourism Service Formation Based On Ensemble Learning
9	Kernel-based Research And Application Of Integrated Learning Algorithm,
10	Research On Clustering Ensemble Based On Feature Relationships