| Facing China’s "2035" plan,universities and organizations are strategically arranging teach-ing data mining and deepening the decision-making thinking of education data.The data mining analysis of students’ performance in education data mining has been one of the im-portant research topics.Society and universities are actively trying to combine education with cutting-edge data mining techniques to find valuable information service teachers and to inject the energy of science into accurate teaching.Among them,students’ test scores are a key link in teaching activities.The quality of students’ scores reflects the quality of teach-ing and learning,and affects the future development path of students.In the current research,many scholars have discussed the relationship between students’psychology and behavior habits and students’performance.However,mastery of students of learning content and their professional assistance are directly related to the performance of college students.This thesis analyze the relationship between the above two characteristics of college students and test scores,and find out the key factors affecting test scores,so as to use scientific data anal-ysis methods to help teachers and students effectively analyze their academic performance.Based on the above objectives,this thesis uses data preprocessing technology and machine learning technology to construct a reasonable student feature portrait.The combination of machine learning algorithm is designed to deeply analyze the characteristics of students,and then the prediction model is constructed to realize the academic early warning.Firstly,the multi-source data composed of students’ test scores and students’ background information is preprocessed.Based on the correlation between students’ characteristics and the total score,a reasonable experimental data set is constructed.In order to find out the key factors that af-fect students’ performance,the improved K-means clustering algorithm is designed to mine and analyze the experimental data.Using the visual display of clustering results,the charac-teristics of students in different clusters are analyzed in detail,and the key factors affecting the total score are determined.Then the three prediction models based on the improved random forest and random forest multiple linear fitting regression algorithm are designed respectively.The key characteristic attributes that affect students’ performance are used to predict students’ performance.In the improved random forest algorithm,the decision tree generated by the traditional random forest is reduced,and the high-quality decision tree is selected to form a high-precision and high diversity sub forest for subsequent experiments.In the experiment,the accuracy rate,recall rate,F value and mean square error are used to evaluate the experimental results.Finally,based on the comparative analysis of the exper-imental results of the two prediction models,the prediction model based on the improved random forest algorithm with better accuracy is selected to assist the academic early warn-ing,and the accuracy of the model reaches 93.06%.Through the analysis of student data mining,it can help teachers and students perceive the academic risk in advance,make rea-sonable and scientific changes in learning and teaching plans,and improve the efficiency of learning and work. |