With the increasing development of emerging technologies such as the Internet and big data,the market size of movies on major online video platforms continues to expand,providing Internet users with a colorful visual feast.However,when users face the huge scale of movies,they will also be impacted by a lot of invalid information,and then they are in a state of no choice.The movie recommendation system can effectively deal with this "information overload" problem,but it also faces many challenges during its rapid development,such as inefficient execution of algorithms in the face of large-scale data,cold start of new users,and sparse rating matrix,resulting in unsatisfactory recommendation results and other issues.After studying the recommendation algorithm used in the current movie recommendation system and other big data technologies,this paper proposes a new movie recommendation algorithm(MRA-IDSSM)and spark cache management strategy,which can effectively improve the execution efficiency of the algorithm in the face of large-scale data,improve the accuracy of recommendation and solve the cold start problem of new users.Aiming at the problems of low recommendation accuracy of current movie recommendation algorithm and cold start of new users,this paper proposes a movie recommendation algorithm based on improved deep structured semantic model(MRA-IDSSM),which mainly improves the structure of DSSM model,and constructs the fusion algorithm of the two models.Firstly,a model is used to learn the user’s subject preference,and then the recent viewing records are weighted and averaged according to the user’s subject preference.At the same time,combined with the explicit characteristics of users and movies and the implicit interaction information between them,the hidden semantic relationship between users and movies is extracted to mine movies that meet the user’s interest.Simulation results show that this algorithm can effectively improve the accuracy of recommendation,and can alleviate the problems of cold start and matrix sparsity.Aiming at the current spark cache management strategy fails to make full use of memory resources,this paper proposes a spark cache management strategy across multiple actuators,which uses an expulsion strategy based on next reference distance(NRD),uses the dependency information of different RDDS in DAG to optimize the expulsion of data blocks,and considers the relative distance between job and stage referenced by each RDD in application workflow,It can effectively expel the data with the farthest reference distance and the least likely to be used in the cache.Simulation results show that this strategy can improve the overall cache hit rate,shorten the execution time of spark tasks and improve the execution efficiency of recommendation algorithm.Finally,based on the recommendation algorithm and Spark cache management strategy proposed in this paper,the current mainstream big data technology,front-end and back-end technologies are selected to implement a movie recommendation system based on Spark machine learning,and the main functions of the system are tested.The system demonstrates the practicability of the proposed recommendation algorithm and Spark cache management strategy. |