| In order to fully implement the Rural Revitalization Strategy,accelerate the integration of urban and rural development,and realize the development of rural industries and the revitalization of talents in rural areas as soon as possible,"Mass innovation and entrepreneurship" in the Party Central Committee "Under the keynote of the 19 th CPC Central Committee,the phenomenon of returning home to start a business in rural areas presents an upsurge,and will become more and more intense in the foreseeable future.At the Fifth Plenary Session of the 19 th CPC Central Committee,it was emphasized to promote the development of rural agricultural modernization and realize the general requirements of industrial prosperity,ecological livability,rural civilization,effective governance and affluent life.Returning home to start a business has become the key to improve the employment rate in rural areas,improve the well-being of the population in rural areas and even maintain social security Will be an important means of stable development.Without regard to the background and theoretical basis of the times,how to realize the classification and identification of successful entrepreneurs in rural areas has certain practical and theoretical significance.In this paper,we use the crawler technology to pre process the entrepreneurial characteristics of the entrepreneurial population in rural areas,and use the crawler technology to pre process the entrepreneurial characteristics of the entrepreneurial population,and use the crawler technology to capture the characteristics of the entrepreneurial population in rural areas.Then,three machine learning models of support vector machine,BP neural network and random forest are established in the R language environment.Based on the characteristic variable system,the training simulation is carried out to train and test the successful entrepreneurs in rural areas.Finally,a unified evaluation index is adopted to evaluate the recognition effect,including accuracy,kappa coefficient,specificity,recall,precision,AUC value and P-R curve.In the empirical process,this paper first preprocesses the data and carries out characteristic engineering to establish the characteristic coordinate system for the identification of successful entrepreneurs in rural areas,including ten characteristic variables: whether they receive technical support,entrepreneurial form,whether they employ poor people,marital status,whether they receive financial support,age,region,entrepreneurial motivation,whether they receive policy support,gender and entrepreneurial industry,The importance of variables is predicted by using random forest model.The empirical process proves that in the classification and recognition of machine learning,the characteristic index system can provide more accurate classification and recognition results for machine learning,and the institutions providing financial support can refer to the index system to avoid risks.Moreover,the characteristic index system is conducive to the relevant institutions to intervene in the Regional Entrepreneurial population in advance,meet the entrepreneurial needs of the Regional Entrepreneurial population,cultivate the characteristics of relevant entrepreneurial talents,and do a good job in the regional entrepreneurial talent reserve.Secondly,based on the perspective of identifying as many successful entrepreneurs as possible and accurately formulating entrepreneurship related policies according to the characteristics of such people.The main purpose of this paper is to classify the people who are likely to succeed in entrepreneurship through the characteristics of entrepreneurs,and then provide corresponding support.From this requirement,the first concern is the accuracy of identifying successful entrepreneurs.At this time,the application effect of polynomial kernel SVM is better.Finally,from the perspective of overall recognition accuracy and stability of prediction results,the random forest model has better recognition effect on successful entrepreneurs in rural areas.The random forest model shows high accuracy in the test set of unknown data,and the kappa value of random forest is higher in the face of unbalanced data,which shows that the random forest model can better deal with the imbalance of data and find out the nonlinear law in the unknown data,so as to classify the data. |