| With the development of Internet technology,the variety of textual data in the society is increasing,and the amount of information on the network is also increasing.Due to information overload and increased noise data,it is increasingly difficult for people to discover new works with high efficiency.Therefore,the personalized recommendation algorithm for each user has been more and more widely recognized.At the same time,the Chinese movie market is also developing rapidly.Watching movies has become the choice of more people for entertainment.When people choose to watch movies,they often refer to some websites with film reviews,and they also like to watch movies.Post some thoughts and opinions on the Internet,and Douban is a relatively authoritative film review website.Many people like to post reviews and reference ratings on it,so Douban has more and more realistic review data.Therefore,this article uses Douban film review data to study personalized movie recommendation algorithms.Conventional personalized recommendation algorithms use the cosine similarity formula to calculate the similarity between the feature of the movie to be recommended and the feature of the user’s preference,and use this as the user’s degree of interest in the recommended movie.This paper uses Douban film review data for analysis and research,and proposes an improved movie recommendation algorithm based on user interests.This article uses web crawler technology to crawl the data needed by the Institute from Douban,including 22104 movie rating data involving 258 users,and 17,271 movie review text data involving 1798 movies.After obtaining the data,first perform data cleaning and preprocessing on the data,and before text segmentation of the movie review text,reconstruct and expand the original deactivation dictionary and word segmentation dictionary.Part-of-speech tagging,only retaining nouns as feature words for movie theme analysis,and adding some more popular high-frequency words,reducing the size of the feature vocabulary and improving the theme’s recognition effect by expanding the disabled dictionary,And in the word segmentation dictionary used by the word segmentation,loaded with historical celebrity words that improve the recognition of the theme of film reviews.Based on this,Jieba word segmentation is used to text segmentation of the movie review text.Then use LDA modeling and analysis method to perform thematic analysis on the text review data of the completed word segmentation to obtain the theme category and content of the Douban movie,and the preferred theme distribution of each review text,and then obtain the feature theme distribution of each movie.Then,the user’s preferred theme distribution is obtained based on the user’s movie rating list and the feature theme distribution of each movie that has been overrated.Next,this article analyzes the user’s actual level of interest in various topics.Based on the user’s preference topic distribution and the average topic distribution of all movies,a new algorithm is calculated to calculate the user ’s actual level of interest in different topics.The algorithm of the user’s degree of interest in each movie,and then uses the improved movie recommendation algorithm to calculate the user’s prediction score for the recommended movie,and compares it with the conventional movie recommendation method.The experimental results show that the improved movie recommendation algorithm based on user interest proposed in this paper has better recommendation effect.Based on this,in order to further improve the recommendation effect of the model,this paper adds the film praise index and the film heat decay index to construct the final film recommendation algorithm,and selects the value of the attenuation coefficient according to the result,which further improves the overall.The recommended effect of the model.In summary,the improved movie recommendation algorithm proposed in this paper improves the effect of the movie recommendation system,effectively improves the user experience of the website,and has practical guiding significance for the website operator. |