Font Size: a A A

Research On Prediction Method Of Movie Box Office Based On Machine Learning

Posted on:2021-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ChenFull Text:PDF
GTID:2505306245981919Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the development of the national economy in recent years,people’s living standards have greatly improved.Under the condition that the basic material requirements have been met to a certain extent,movies,as a form of cultural entertainment,meet people’s spiritual needs to a great extent.At the same time,the rise of social networking platforms and the development of mobile payment have also increased the publicity of movies and the convenience of movie ticketing to a certain extent.Therefore,China’s movie industry has also developed rapidly in the past decade.From 2006 to 2018,the total box office of Chinese films increased year by year,and the total box office data grew fastest from 2009 to 2010.In 2018,the total box office of Chinese films exceeded 60 billion yuan.Movies play a certain role in promoting the spread of culture and economic development.This paper intends to make a comparative study on the prediction methods of each movie box office,so as to obtain a more accurate prediction model.Based on previous studies,this paper selects a total of 23 variables to build a movie box office prediction model from the movie’s own attributes,the movie’s pre-publicity and the movie’s feedback for a week.Firstly,Python’s crawler technology was used to crawl the basic attribute information and box office data of 2008 movies released in China from 2015 to September 2019.383 movies with a box office of over 100 million were screened out for analysis.the average value of Baidu index in the week before the movie was released was obtained on the Baidu index platform for the corresponding screened movies.then,the average attendance rate and average film volume in the week before the movie was released were obtained on cat’s eye movie professional edition.at the same time,the comment data for the week before the movie was released and before the movie was released were crawled from douban.After finding relevant movie review data,polarity labeling is performed to create a new corpus.Based on the new corpus,a Python program is written,and a new naive Bayesian model is trained by using SnowNLP library.The model is called to calculate the positive emotional tendency of each review of each movie,and the average value is the positive emotional tendency of this movie review.Then,quantify the movie type,star influence,director influence and other influencing factors.After all the data are obtained,they are normalized and preprocessed,and three prediction models of random forest,BP neural network and LSTM neural network are trained respectively to predict the movie box office.At the sametime,this paper uses the modified blending method to fuse the three models to obtain a new movie box office prediction model,thus the prediction results of each model can be obtained,and then the prediction effects of the four models can be compared.The text also uses the random forest model to obtain the influence degree of each variable on the movie box office,and obtains the following conclusions.Judging from the influence degree of variables on the box office,the top five influencing factors for the contribution rate of the random forest model to the movie box office are respectively the average number of sets in a week,the average attendance rate in a week,the positive emotional inclination,the director’s influence,and the Baidu index.it can be seen that the positive emotional inclination has the third and greater influence on the movie box office.From the prediction results of the model,this paper selects root mean square error(RMSE),average absolute percentage error(MAPE)and R_square(decision coefficient)as evaluation values to evaluate the prediction fitting effect of the model.From the perspective of a single model,the prediction effect of the model is in the order of random forest,LSTM neural network model and BP neural network model from top to bottom.However,the decision coefficient of the new model based on the fusion of the above three methods is the largest,and the root mean square error and average absolute percentage are smaller than those of the other three models.It can be seen that the prediction accuracy of the model after fusion is higher and the fitting degree is the highest.Finally,according to the results of the study,the corresponding suggestions are put forward.
Keywords/Search Tags:Film box office, Positive emotional inclination, Model comparison, Model fusion
PDF Full Text Request
Related items