Font Size: a A A

The Graduates' Income Forecast And Analysis Research Based On Machine Learning

Posted on:2018-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q ZhuFull Text:PDF
GTID:2348330515496681Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the information age,information technology has had a tremendous impact on the community,and the development and reform of education.However,the rapid development of information technology makes the database capacity becomes more and more,and the massive data,the education sector urgently need efficient information technology to deal with data,thus digging out the management decision-making information,data mining technology of the application to provide a solution to this.Based on the above research background,this paper takes the machine learning algorithm as a tool to analyze the data set used on the Score Card website of the American university,and establishes the regression and classification model with the school characteristics as the input and the average income of the school graduates.By using the model,the average income of the school graduates can be reasonably predicted by the characteristics of a university,which will provide a good subsidy for the effective allocation of funds such as education grants and the establishment of private schools.The main work of this paper is as follows:1.The model is established by using the univariate linear regression algorithm for the relationship between the characteristics of each university level and the target value.The influence of the individual feature variable on the average income of the graduates is analyzed and its meaning is interpreted.2.In this paper,the multivariate regression model and the KNN regression model are used to predict the average income of the graduates.According to the characteristics of the data set,the traditional regression algorithm is improved,and the KNN polynomial regression algorithm is proposed.The algorithm is better than the multivariate regression algorithm and the KNN algorithm,but the training time is relatively long.The problem is that the average income of the graduates is not a problem that the data item will change frequently.The time complexity of the algorithm is the sum of the time complexity of the two basic algorithms,and its advantage in solving the problem of regression is very obvious.3.The four methods are used to classify the average income of the graduates.The four methods are logical regression,decision tree,KNN and Adaboost.In these four algorithms,the classification accuracy of Adaboost algorithm is the highest,and the classification accuracy of KNN algorithm is the lowest,and it is not as good as random prediction.The use of logical regression occurs when the recall rate of 100% of the special circumstances.4.Based on the experimental results of 3,the traditional logical regression is modified and a logical regression algorithm based on recall rate is proposed.If the trained logistic regression model is too high in the verification set and the training set,the training set can be divided according to the index of the high item,and the divided sub-module can be trained.So that the original layer of the model will become two layers,the actual accuracy of the model needs to be verified on the verification set.The model can be infinitely recursive until the accuracy of the model on the verifier begins to decrease as the depth of the model increases.
Keywords/Search Tags:Education information, Data mining, Machine learning, Regression model, Classification model
PDF Full Text Request
Related items