Research On Video Recommendation Technology Based On Multimodal Fusion

Posted on:2024-07-22

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X Q Zhuang

Full Text:PDF

GTID:1528307058473024

Subject:Network and network resource management

Abstract/Summary:

PDF Full Text Request

With the development of the Internet and mobile technology,video has become a very popular form of media.Due to video information overload and redundant content,it is difficult for people to find the information they need quickly.To solve these problems,video recommendation systems have emerged.However,video recommendation systems still face some challenges.They are mainly in the following aspects:(1)Data sparsity and cold start problemData sparsity refers to the limited interaction data between users and videos,which makes it impossible to accurately understand users’ interests and preferences.The cold start problem refers to the lack of sufficient personalized data for accurate recommendations in the initial stage of the recommendation system or when facing new users.(2)Feature mining problemMining effective features can help recommender systems understand video content and user interests more accurately,thus improving the accuracy and personalization of recommendations.Some feature interactions can be easily detected,and manual feature engineering can be used to mine the association between features.But most of the feature interactions are hidden in the data and difficult to be mined.Therefore,modeling feature interactions effectively becomes a challenge.(3)Diversity problemTraditional recommendation systems tend to recommend items to users that are similar to their historical interests,resulting in too many similar recommendation results and a lack of diversity.This situation may limit users’ discovery of novel and diverse contents and reduce their satisfaction and interest exploration ability.In response to these problems,researchers at home and abroad have proposed many solutions.Traditional video recommendation techniques are based on a single data source,for example,using only the image content of a video for recommendation,and this limitation leads to a recommendation system that cannot fully understand the interests and needs of users.With the continuous progress of computer technology,it has become easier to acquire and process multimodal information such as audio and text,and the accuracy and personalization of recommendations can be improved by considering multi-modal information such as images,audio and text of videos,combined with deep learning techniques.Therefore,it is of great significance and application value to study multimodal fusion video recommendation.In this thesis,we address the problems of the existing research work mentioned above and conduct an in-depth study on multimodal fusion-based video recommendation algorithm,which includes using knowledge graph to fuse multimodal data information to capture nonlinear useritem relationships and solve the problems of data sparsity and cold start in recommendation;introducing a deep two-way self-attention mechanism to model user behavior sequences,which can well distinguish the importance of features and tap interaction between features;construct heterogeneous graph networks to mine strong association rules between items and improve the diversity of recommendations.The main work of this thesis is as follows:(1)A video recommendation algorithm based on multimodal fusion and knowledge embedding is proposedAt present,video recommendation faces problems such as sparse data and cold start,and traditional recommendation methods often fail to accurately capture users’ interests and personalized needs.To solve this problem,this thesis proposes a video recommendation algorithm called knowledge-aware interest fusion gating network(KIFGN),which achieves accurate modeling of user interests and personalized recommendations by incorporating knowledge graph and introducing interest fusion gating units.The KIFGN model first uses knowledge graph to enrich feature information of video recommendations.The KIFGN model introduces a user gating unit and an interest gating unit,which are structured knowledge representations that contain rich entity relationships and attributes.The user gating unit learns the interest cluster features of users by modeling their historical behaviors.The interest gating unit,on the other hand,learns the interest cluster features of videos at different granularities by modeling the attributes and knowledge properties of videos.The interest fusion gating unit dynamically updates and fuses the interest cluster features of users and videos through the gating mechanism,thus realizing the personalized modeling and recommendation of user interests.To evaluate the performance of the KIFGN model,this thesis conducts extensive experiments on a real video recommendation dataset.The comparative experimental results show that the KIFGN model outperforms other baseline models in all evaluation metrics,and the model can capture user interests and video features more accurately and provide personalized video recommendation results.(2)A video recommendation algorithm based on multimodal fusion and BERT is proposedWith the popularity of social media and video sharing platforms,video recommendation has become an important task to improve user experience and enhance platform activity.However,traditional video recommendation methods often face the problem of inadequate feature mining and cannot fully utilize the multimodal information of videos and the personalized interests of users.To solve this problem,this thesis proposes a video recommendation algorithm called crossmodal attention-aware BERT(CCA-BERT),which aims to achieve more accurate and personalized video recommendations through the application of multimodal fusion and BERT models.Firstly,three independent BERT models are used to encode video multimodal features and their intrinsic correlation is modeled using multi-headed self-attention.In addition,a crossmodal attention-aware network for candidate videos is proposed in this thesis to accurately model users’ interest in candidate videos.Experiments demonstrate that fusing multimodal video information can effectively improve the video recommendation effect.(3)A video recommendation algorithm based on multimodal fusion and graph convolutional network is proposedWith the rapid development of social media and video sharing platforms,diversity video recommendation has become a key task to provide personalized user experience and increase user engagement.However,traditional recommendation methods are mainly based on user behavior and content features,ignoring the multimodal nature of videos,which limits the diversity and performance of recommendation systems.To solve this problem,this thesis proposes a graph neural network model called multimodal heterogeneous feature aggregation network(MHFAN),which aims to mine the global information of multimodal heterogeneous graphs of user videos.MHFAN model achieves effective fusion and mining of multimodal data by representing multimodal features of users and videos as nodes of heterogeneous graphs and using the efficient representation learning capability of graph neural networks.Specifically,the MHFAN model designs a heterogeneous graph construction strategy that employs a multilayer graph convolutional network to aggregate information between different types of nodes.By incorporating recurrent neural networks,the MHFAN model is able to capture the complex associations and dependencies between multimodal data in heterogeneous graphs.To evaluate the performance of the MHFAN model,this thesis conducts extensive experiments on the Tik Tok and Movie Lens datasets.Compared with traditional recommendation algorithms and other recommendation models based on heterogeneous graph structures(e.g.,CMF,GRU4 Rec,H-RNN,GCE-GNN,and A-PGNN),the MHFAN model achieves significant improvements in all evaluation metrics.The experimental results show that the MHFAN model can better mine and utilize the multimodal information of videos,and improve the accuracy and personalization of the recommendation system.In summary,this thesis addresses the problems of data sparsity,cold start,feature mining,and diversity in video recommendation systems.In order to solve the above problems,the multimodal data are processed and fused using deep learning techniques,based on which several recommendation algorithms are proposed to improve the video recommendation effect,and the experimental results and data analysis prove the effectiveness of the algorithms proposed in this thesis.

Keywords/Search Tags:

Multimodal Fusion, Video Recommendation, Heterogeneous Graphs, Knowledge Graphs

PDF Full Text Request

Related items

1	Research On The Construction And Application Of Temporal Academic Knowledge Graph
2	Research On Multimodal Knowledge Tracing Methods Based On Heterogeneous Graphs
3	Research On Entity Alignment For Multimodal Knowledge Graphs
4	Design And Construction Of Vertical-oriented Knowledge Graphs
5	The Research Of Distributed Storage And Indexing Scheme Of Large Scale RDF Knowledge Graphs
6	Research On Correlation Of Intelligence Based On Knowledge Graphs
7	Research And Application Of Learning Methods For Knowledge Graphs
8	Research On Completion Algorithm Of Temporal Knowledge Graphs Based On Representation Learning
9	Research On The Distributed Representation Learning Of Knowledge Graphs
10	The Research Of Distributed Regular Path Queries Over Large RDF Knowledge Graphs