| Big data plays an important role in many industries.Big data is used in hospitals,universities,restaurants,banks,etc.The existence of big data is very important for the Internet system under modern artificial intelligence,especially the education system.Now,universities at home and abroad are generating a large amount of data related to the daily life of college students,workers,and teachers.The rational application of these data is helpful to the management of university education.The big data used in this study is the relevant data from Lanzhou University of Technology.It mainly focuses on library data,student card consumption on campus,student grades,and student enrollment information.Through the processing and analysis of these data,conduct a comprehensive study on the behavior of students.Divide the student data into four aspects by cleaning and merging all the data: grades,number of borrowed books,student majors,and annual consumption data.In this work,first use the FP-Growth mining tool to obtain the data used in this research,use the Rapid Miner tool and the Python language(Pandas package)to obtain the association between the data,and select the most valuable features to predict student behavior.Then,the K-means algorithm is used to cluster the student data,and according to the clustering results,the relationship between different students’ academic performance,book borrowing data,and campus card consumption data,as well as the behavioral differences among different students is mined.Use the elbow method to determine the optimal number of clusters for K-means.Afterward,the students’ existing actual grades are used to predict their grades for the next year.In this step,predictions are made using Logistic Regression(LR),Random Forest(RF),Naive Bayes(NB),and Neural Networks(NN).Neural Networks are 77% accurate,Naive Bayes is 76% accurate,Logistic Regression is 77% accurate,and Random Forest is 76% accurate.Combining the characteristics of logistic regression(LR),random forest(RF),naive Bayesian(NB),and neural network(NN),a new classification model is proposed to predict the performance of students based on their behavior.The new multi-classification model proposed in this paper achieves an accuracy rate of 78%,which is the best performance compared to other classification models.Finally,analyze the prediction results and calculate the ranking of feature importance.The experimental results show that the ranking of data feature importance is as follows: the number of books borrowed by students,the student’s major,the grades of the last academic year,and the negative importance feature value of card consumption amount.Results based on the feature importance approach suggest that it is helpful to encourage students to borrow books from the library and choose majors that match their interests and strengths.In addition,it would be useful to educate students about the potential negative impact of excessive spending on academic performance. |