Font Size: a A A

Predicting Students' Academic Performance Based On Educational Data

Posted on:2019-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:Obsie Efrem YohannesFull Text:PDF
GTID:2417330545472126Subject:Computer technology
Abstract/Summary:PDF Full Text Request
This study presented a case study in educational data mining.It shows the potential of data mining in higher education.The study was especially used to predict students' performance and to identify those courses in the first,second and third year which are effective predictors of students final CGPA8.This is to enable strategic intervention to be carried out before the students reach the higher semesters including the final semester before graduation.The advent of data mining techniques has encouraged researchers to attempt applying it in the educational sector to discover knowledge from the students' data.The availability of educational data has been growing rapidly,and there is a need to analyze huge amounts of data generated from this educational ecosystem,as a result,Educational Data Mining(EDM)field has emerged.A significant problem in higher educational institutions is the poor results of students after admission.Many students leave universities due to a variety of reasons:poor background knowledge in the field of study,very low grades,incapacity of passing an examination,lack of financial resources and due to other reasons.Predicting students' results is a serious problem for the management of the universities who want to avoid the problem of early school leaving.The current Ethiopian government recognizes the importance of education for national development.Policies are mainly aimed at expanding the education sector,improving quality and ensuring that educational content is harmonized with the country's economic needs.To achieve the intended goals,the quality of education needs to be on a right path.Although the quality of education has various faces and participants,the main participants are students.The quality of education can be seen from the side of success or failure of students.Hence,finding factors for the success of students' can help in making different and timely managerial decisions in order to improve the failure.The aim of this study was,therefore,to develop an approach that could be used to predict students final CGPA8 from educational databases by carefully analyzing the student's scores,with no socio-economic and psychological data.Thus,if a reasonable prediction can be reached with scores only,it makes the implementation of a Student Performance Prediction System(SPPS)in a university easier.The dataset used in this study was gathered from Student Information System(SIS)of Hawassa University with the authorization from the registrar and alumni directorate.The university uses a web-based system to support the teaching and learning process.The dataset used as part of this study comprised a total of 134 undergraduate students from the department of computer science for the year of 2015,2016 and 2017 graduated students.The raw data extracted from the system wasn't suitable for analysis.So,the data had to be cleaned taking into account attributes that have an impact on the student academic performance measurement.Before building the predictive models,the datasets were preprocessed by applying normalization procedures.After preprocessing step,the study conducted distinct experiments using different data mining algorithms by using three combinations of predictors named as Scenario 1 to do experiment 1,Scenario 2 to do experiment 2 and Scenario 3 to do experiment 3 with different inputs and same output to come up with purposeful results.Scenario 1 predict final CGPA8 of students according to the students pre-university score and university course scores completed during the first 2 years of study.Scenario 2 predict final CGPA8 according to the students pre-university score and university course scores during the first 3 years of coursework.Scenario 3 considers the students Semester GPA at the end of each semester during the first 3 years of study to predict final CGPA8.In all of the experiments,the popular cross-validation test method was used.According to the 10-fold cross validation,the training dataset contained approximately 121 samples,while the test dataset contained about 13.These values were not fixed;when divided for 10-fold validation,the whole dataset(134 samples)included one-fold containing 14 samples or two folds containing 13.To achieve the objective of the study,this research used three prediction methods:Neural Network(NN),Support Vector Regression(SVR)and Linear Regression(LR)in order to predict the students final CGPA8.The experiments were performed using WEKA environment.Among the different available algorithms in WEKA,Multilayer Perceptron(MLP),Support Vector Regression(SVR)and Linear Regression(LR)were used in this study for model building.The performance of the models was measured using the coefficient of correlation(R)and that of Root Mean Square Error(RMSE).The Statistical Package for Social Scientists(SPSS)was also used in the analysis to examine which combinations of first,second and third-year courses that best predict students' performance i.e.CGPAIn this study,experiments were done using NN,SVR,and LR methods to build predictive models.First,statistical analysis was performed to explore features of the dataset used in this study.Thereafter,three experiments were conducted with three scenarios.The result from the first experiment showed that SVR method was efficient at minimizing the root mean square error between predicted and targeted.Besides,with SVR method student final CGPA8 prediction is possible at correlation coefficient(R)value equals to 0.9305.The experimental result from the second experiment indicates LR method outperforms the other two prediction methods SVR and NN.For this experiment,LR was more efficient at minimizing the root mean square error between the predicted and targeted values and capable to predict the final CGPA8 at a correlation coefficient(R)value equals to 0.9758.The result from the final experiment indicates that again LR method showed slight improvement at minimizing the root mean square error between predicted and targeted.In addition,with LR method student final CGPA8 prediction is possible at a correlation coefficient value equals to 0.9805,thereby,further increasing of the correlation coefficient(R)value by 0.0047.Overall,the least accurate prediction result for all the experiments was obtained by the NN methodThe results from SPSS show that most of the variables are significantly correlated with the target variable except for UEER in some of the cases had a weak correlation.However,to generate a general predictive model to cover as many cases as possible,it was decided to include the UEER as a predictor variable in the predictive models.Further experiment generated from SPSS confirmed that the stepwise regression analysis was able to identify important courses with the better predictive power to the final CGPA8.This study shows that it is possible to predict the student graduation performance,which is measured by CGPA using only pre-university score and scores of first,second and third-year courses,no socio-economic or demographic features.Overall,the study has verified that data mining techniques can be used in predicting students'academic performance in higher educational institutions.All the experiments gave valid results and can be used to predict graduation CGPA.However,comparisons of the experiments were done to determine which approaches perform better than others.Generally,SVR and LR methods performed better than NN.Therefore,this study recommends the adoption of SVR and LR methods to predict final CGPA8,and the models can also be used to implement student performance prediction system in a university.Accordingly,the study has used the models from SVR and LR methods for designing an application to do the prediction task.
Keywords/Search Tags:Educational Data Mining, Linear Regression, Neural Network, Support Vector Regression, Student Performance Prediction System
PDF Full Text Request
Related items