Font Size: a A A

Research On User Activity Of Short Video Social Media Based On Machine Learning

Posted on:2021-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:F W ZengFull Text:PDF
GTID:2428330629952663Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and the dramatic improvement of computer's computing performance,artificial intelligence is pushing the Internet technology into a whole new era.Driven by the Internet of Things technology,users have put greater demands on data acquisition and sharing,which has led to an explosive growth in data volume.With the popularization of mobile intelligent terminals,short video social media is popular all over the world,and the continuous use of short video social media active users is a necessary and sufficient condition for its success.The prediction of user activity has a direct guiding role in the follow-up user churn warning.Compared with traditional batch processing data,the data used for user activity prediction is presented in the form of streaming data,which has the characteristics of higher real-time requirements for data processing,and the mathematical distribution of data will change over time.Due to the balanced distribution of the positive and negative samples in the data set used in the experiments in this paper,after considering the evaluation indicators such as accuracy,recall,F1,AUC,etc.,this article decided to use the AUC value as the evaluation standard for the model.This paper first introduces the data used in this experiment when constructing a user activity prediction framework.Because the original data does not have labels,this article defines the problems to be solved in the experiment,based on the time-dependent data used in the experiments in this article,sliding windows are used to divide the data,it can increase the number of samples,and can forget the data before the start of the window through the window mechanism.Based on the definition of the problem and the sliding window,the original data is labeled.Due to the wide application of machine learning,the research on data reliability has been valued by researchers.In the exploratory analysis of the data,this paper finds out the abnormal data through visualization and cleaned the data,and givesexperimental evidence.Feature engineering is the act of extracting features from raw data and converting them into a format suitable for machine learning models.During feature engineering,this article mines the features in the data.After obtaining the features,the features are selected based on the importance of the features and the Pearson correlation coefficient between the features,it is concluded that the feature selection method in this paper can greatly speed up the model training speed.In this paper,five machine learning algorithms,LightGBM,Support Vector Machine(SVM),KNN,Decision Tree,and Naive Bayes,are compared experimentally under full feature conditions.The optimal AUC of a single model under the AUC evaluation standard corresponds the algorithm is LightGBM,which reaches 0.9158.Model fusion is an important means to improve the performance of the algorithm.This paper proposes a model fusion algorithm based on grid search.The basic idea of the algorithm is to use the idea of grid search to find the weight of each sub-model when it is weighted and fused.Since the algorithm is based on the idea of grid search,this algorithm is suitable for applications with a small number of sub-models.For the five machine learning sub-models used in the experiments in this paper,the algorithm calculates the weights of the five seed models and uses the obtained weight combination to predict.The final AUC value after the fusion reaches 0.9377,which is higher than the optimal AUC value of a single model Up 2.4%.
Keywords/Search Tags:Short video social media, machine learning, sliding window, LightGBM, model fusion
PDF Full Text Request
Related items