With the continuous development of Internet and machine learning technology,more and more companies seek business opportunities and operation direction from the company’s historical big data through data mining technology.As the main consumers of companies,data mining from the perspective of user data is the main research direction of major companies.For online e-commerce platform,it is more important to obtain methods from user data to maintain user groups and manage user relationships.As a maternal and infant online sales platform,in order to obtain its own competitive advantage,kidswant app must empower data and obtain value and income from data.In this paper,firstly,feature cleaning and missing value filling are carried out on the data set,and then category feature coding is carried out,and continuous features are processed by principal component analysis,logarithmic transformation and standardization.On this basis,the logistic regression model,xgboost model and catboost model were used to fit and screen the data sets under the four methods of over sampling,smote over sampling,under sampling and cost sensitive to deal with unbalanced data,and the accuracy,AUC,recall and F1 of each model were obtained.Finally,the performance of xgboost model after under sampling and feature screening is better,and the recall rate on the test set reaches 86.18%,which shows that 86.18%of the positive samples are predicted correctly,which is in line with the goal of improving the ability of the model to predict positive samples.Then,this paper uses four methods to deal with the unbalanced data in xgboost and catboost for different commodity data sets,and carries out model fitting.The important characteristics of the optimal performance model are compared and analyzed in xgboost and catboost for different commodity data sets.Finally,the important characteristics of the two algorithms are combined and analyzed,which gives some suggestions for the data operation of kidswant app suggestions. |