Font Size: a A A

Research On Dropout Rate Prediction Of Massive Open Online Courses

Posted on:2019-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:G Q ZhanFull Text:PDF
GTID:2417330548966989Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet Education,MOCC have become more and more popular.There are a large number of MOOC platforms both domestic and international,the number of registered users has reached millions.However,according to survey statistics,it is found that the completion rate of the courses is generally low.How to reduce the dropout rate of MOOC users has become an urgent issue to be studied.Although a large number of scholars have conducted theoretical and predictive model analysis on this issue,but the research base is based on small sample data and rarely considers the timeliness of the big data environment.This article researches the drop-out rate of MOOC's students based on the CCNU MOOC platform,and conducts a series of studies on the construction of user behavior characteristics and the construction of user drop-out rate prediction model,and gives a new ideas about dropout rate prediction of MOOCs users based on big data environment.The main content of this article mainly includes the following three aspects:1)Distributed weighted SVM prediction modelBased on distributed environments and SVM classifiers,a distributed SVM is designed,and based on this,a special weight matrix is added to optimize the data of the model to improve the recognizability of unbalanced data and further improve the accuracy and the efficiency of the model.,the core algorithm is referred as PW-SVM in this article.Based on the large amount of data and the real-time characteristics of the MOOC platform,the characteristics of user behavior data of the analysts' colleges,the ELK framework was used to design the MOOC platform's data collection mechanism.Both Logstash and Elasticsearch have good extensibility.With the expansion of the platform,the data collection pipeline can also be expanded.Logstash can customize the components to clean the data during data collection.Combining Elasticsearch and Kibana,the data can be stored and statistically processed efficiently.Based on the current research status and users of the CCNU MOOC platform.The behavioral characteristics analysis the user's behavioral attributes,further count the user's behavioral data,and derive the characteristic matrix of the user's behavior.2)User feature weight model and PW-SVM implementationDue to the unbalanced of the test data,in order to improve the training efficiency and performance accuracy of the model,the points of sample are fully separated,and the Analytic Hierarchy Process is used to build the user feature weight matrix,and then a distributed SVM algorithm is used to train model.distributed feature weighted SVM training method,hereinafter referred to as PW-SVM.Support vector machine(SVM)have good in training non-linear and high-latitude small sample data,but under big data environment,the traditional SVM's computational efficiency will be greatly reduced.In order to solve this problem,a large amount of article has proposed distributed SVM algorithm,such as the SVM algorithm in Spark framework.However,this algorithm is based on a linear classifier and has defective in the training of nonlinear sample data.Based on this,this paper based on distributed P-pack SVM algorithm and implements a nonlinear SVM algorithm based on Spark.In theory,the efficiency of the algorithm will increase with the increase of nodes.3)Experimental analysisThe experiment is divided into two parts.The verification process is divided into two parts.LibSVM,Mllib SVM,and PW-SVM are compared with small sample data and large sample data.Before the experiment,the experiment is performed to analyze the best parameter of PW-SVM algorithm.The first group of experimental results shows that PW-SVM takes a long time in training small samples,but still has a certain degree of accuracy,indicating the availability of the model;The second group of experiments used hundreds of thousands of large samples to conduct comparative experiments.It was found that PW-SVM is more efficient than LibSVM in training time and is more accurate than the linear training method in Mlib SVM,which indicating that PW-SVM has better performance in big data environment.Further based on the course behavior data of the CCNU MOOC and uses PW-SVM to train model.Because the data set has unbalanced characteristics,it is found that after adding the weight matrix,the speed of model training convergence is faster,and the accuracy rate is improved.After the unbalanced data is calculated through the weight matrix,the SVM training will have better results.This shows that the method has certain reference value for the prediction of the dropout rate of the MOOC users,and can provide data support for related teaching decisions.
Keywords/Search Tags:MOOC, dropout rate, distributed, support vector machine, weight matrix, ELK
PDF Full Text Request
Related items