Font Size: a A A

Movie Box Office Data Prediction Based On GBRT-Stacking Ensemble Learning Algorith

Posted on:2024-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:L FangFull Text:PDF
GTID:2555307049485964Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the continuous improvement of people’s consumption level and the continuous progress of film and television technology,the film culture industry has developed rapidly,and film consumption has gradually become a new trend.At the same time,the emergence of social networks has made people’s choices about watching movies more and more dependent on the evaluation of film works.Therefore,based on traditional variables such as directors,actors,and themes,this dissertation introduces the index of film criticism,analyzes the sentiment of film reviews by constructing a film emotional dictionary,obtains emotional scores,explores the impact of film reviews on film box office,and analyzes the influencing factors of film box office more comprehensively.In order to improve the situation that movie box office factors are difficult to quantify and the accuracy of existing prediction models is not high,this dissertation selects 89 movies with high box office data from 2021 to 2022 on Douban movies,selects 400 reviews for each movie(all less than 400 articles obtained),and proposes a stacking ensemble learning model to predict the total box office of movies.Through Stacking ensemble learning,single-model GBDT,XGBoost and Light GBM with high prediction accuracy are used to improve the prediction accuracy of predictive models.The final results show that the root mean squared error(RMSE)predicted by the Stacking fusion model is 1.27,which is smaller than the RMSE predicted by the single model,indicating that the prediction accuracy of the Stacking fusion model is indeed higher.In addition,the current research on film box office prediction basically focuses on box office prediction after the release of the movie,and lacks research to give box office prediction before the release of the movie.In this dissertation,while making post-screening prediction,this dissertation constructs a model for pre-screening prediction,which aims to predict the box office before the film is released and improve the prediction accuracy,and help investors estimate the approximate box office so as to make more reasonable investment choices.Considering the limitations of few variables predicted before screening and difficulty in obtaining,this dissertation selects the data of 1172 movies on Maoyan in the past ten years,starts from the prediction model,and proposes a movie box office prediction model based on GBRT,which uses the robustness of the GBRT model and the strong advantages of dealing with anomalies to eliminate the shortcomings of the input variables of pre-screening prediction and improve the accuracy of pre-screening prediction.The research work in this dissertation provides a certain reference for the analysis of film box office factors,and the complete prediction system of pre-screening prediction and post-screening prediction.
Keywords/Search Tags:movie box office data analysis, movie box office forecasts, Stacking integrated learning, GBRT, Sentiment analysis
PDF Full Text Request
Related items