Font Size: a A A

The Data Mining Of Reading Interests And Behavior Of Toutiao Video

Posted on:2022-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:L JiangFull Text:PDF
GTID:2518306764495544Subject:Enterprise Economy
Abstract/Summary:PDF Full Text Request
The popularity of smartphones and the network has brought huge changes to the life of people in the whole society,as well as changes in personal interests and hobbies.In today's increasingly fast-paced life,many people spend a lot of their spare time online,including social networking apps(We Chat,Weibo,etc.),short video software(Douyin,Kuaishou,etc.),and some broadcast platform like Youku.This creates great opportunities and challenges for full-page apps such as Toutiao,which has a large amount of user base.How to bring a huge amount of access traffic to the platform,and how to increase the attention of personal accounts are very realistic problems.This dissertation focuses on Toutiao and use machine learning algorithms to analyze and explore variables related to access traffic.To be specific,firstly,we use the web crawler to obtain multiple video information,such as view counts,comments,title keywords,etc.,and process the irregular data.Secondly,natural language processing technology is used to analyze the text of the title,and then we implement using word segmentation technology and LDA theme model to extract the key words of it,to get the core categories of words.Then,through Doc2 vec,small topic words with high similarity to the core categories of words are calculated,and the reading interest points of current users can be analyzed and mined.According to the results,we can give some views and suggestions to the platform and personal users.Finally,to find out how many views a publisher will get,this dissertation use XGBoost and Light GBM algorithm to make a prediction.And then the same evaluation criteria is used to compare it with the other model which is built base on the standardized data.The results show that the model have a good performance on predicting the view counts of the published content based on certain variables.Therefore,some useful suggestions can be provided for the increase of the access traffic of the platform and the attention of users.
Keywords/Search Tags:Web Crawler, Toutiao, Text Analysis, The Algorithms of GBDT
PDF Full Text Request
Related items