Font Size: a A A

Design And Implementation Of Student Behavior Trend Mining System Based On Big Data

Posted on:2019-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:X HeFull Text:PDF
GTID:2348330569995782Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rise of big data and the wide application of machine learning and deep learning,digging potential information from massive data has been applied to all walks of life.Exploring the application of machine learning and deep learning in education has already begun.However,for the current data mining of school students,students' behavior has acquiesced to long-term stability and ignores the influence of time on behavior.It is a new direction for students' behavior mining to explore whether students' behavior is really immutable,and whether the rules excavated in the past are applicable to the current data.This thesis takes multi-data source student data as the research object,excavates the trend of student behavior among them,and implements a system that can quickly and efficiently analyze students' behaviors of the various colleges based on the Hadoop distributed framework.In order to meet the above requirements,we have defined a new way of expressing the behavior data and created views on multiple data sources to re-extract and simplify redundant data.The weighted random forest classification algorithm was used to screen the important features of the behavior,and the first 20 features were analyzed using the FP-Growth algorithm based on Map Reduce.Finally,these two algorithms are combined to perform periodic iteration of student behavior.Use Hadoop distributed framework to improve the efficiency of the system,and ultimately use Flask and Bootstrap tools to form an interactive interface.It digs and analyses of the changes in the importance,confidence,and lift of each feature,including bursaries,loans,scholarships,excellent student,and achievement increase or decrease,and extracts a large amount of valuable information from them.The behavioral data representation method in the thesis counts the characteristics of student behaviors on a weekly basis,and describes the behaviors through behavior types,extreme values,mean values,medians,proportions,and time slices;Using Map Reduce to achieve the parallel processing of system data cleaning and algorithms.A weighted self-fitting algorithm for random forests,a decision tree parameter self-fitting algorithm based on Map Reduce,and an FP-Growth algorithm based on Map Reduce were proposed.Finally,we got the importance of feature,the frequent binomial sets and frequent polynomials with high confidence;for each college,the latest 20 weeks areselected to train and save its results.Various trend charts are generated based on historical data,and the relationship between feature values and behavior is analyzed according to trends.The innovation of the thesis is as follows:1.A translational iterative training method based on the time axis is proposed specifically for behavioral trend analysis.It uses 20-week data as the training set,and updates a week data in every week.2.A new self-fitting algorithm for iteratively fitting the classification weights for random forests is proposed.It can be iterated to the optimal weightings in a very short number of times in the tens of thousands of weights span.
Keywords/Search Tags:trend mining, random forest, FP-Growth, Self-fitting, Map Reduce
PDF Full Text Request
Related items