Font Size: a A A

Research On Recommender Systems By Utilizing Multi-Modal Data User Modeling

Posted on:2024-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:A B LiFull Text:PDF
GTID:2568307124960079Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Recommendation systems are an important means of addressing the problem of information overload by recommending content that matches users’ interests from a large amount of information.With the rapid development of streaming media,the types of user information are becoming more diverse,evolving from initially single structured text information content to multimodal information(text,image,audio,and video).However,in recommendation systems,the problem of data sparsity is always inevitable,regardless of whether it is single-modal or multimodal data.This thesis addresses the issues of underutilized textual multi-view information,poor expression and transferability of visual features extracted from images,and semantic communication between multimodal data in existing recommendation systems,and conducts research from three aspects: multi-view text feature extraction,image feature extraction,and multimodal data knowledge sharing.Finally,by integrating textual and image data modes,a personalized movie recommendation system based on multimodal data is constructed.The research content of this thesis mainly includes the following three aspects:Proposed a multi-view text personalized recommendation model that combines CNN and multi-head self-attention,which solves the sparsity issue caused by the insufficient use of multi-view information in the text feature extraction process from both text and user representation aspects.In the multi-view text feature extraction module(ECMSA),the model extracts features from multiple components of the text to explore the maximum amount of information carried by the text.Then,using embedding techniques to reduce the dimensionality of multi-view text vectors as input to CNN,the model extracts features from the short-range context of words in the text.Next,the extracted feature vectors are input into the multi-head self-attention network to extract features over a longer range and model the relationship between multi-view text vector features.Finally,the model uses additional attention to select features.In the user representation learning module,the model uses a multi-head self-attention network to model the interaction between the user’s historical click data and enhance the user representation.The model then uses a click predictor for model prediction,achieving an AUC of 0.6383 and n DCG@5 of 0.3586 in experiments,outperforming other comparative methods.Proposed a personalized recommendation model that integrates text and image data to construct a multi-modal data recommendation model for user profiling.The model consists of three parts: Firstly,the ECMSA framework is used to extract text features.Secondly,the IEMSA framework is proposed to address the problem of poor image feature representation and transferability.The model uses Inception V3 for visual feature extraction,then reduces dimensionality using embedding techniques,and learns the mutual relationships between different visual features using multi-head self-attention.Thirdly,a multimodal knowledge sharing module is introduced to address the issue of information exchange between different modalities using cross-modal attention.Furthermore,additional attention is incorporated in both text and image feature extraction processes to select feature vectors with greater information content.Finally,the prediction model outputs the result.Experimental results demonstrate that the proposed method achieves a lower loss value of 0.3301 and a higher AUC value of0.8892,outperforming other comparison methods.Finally,based on the above research content,this thesis integrates them into a framework and designs and implements a personalized movie recommendation prototype system based on multimodal data using the public dataset Movie Lens.The system achieves personalized recommendation function through multimodal(text-image)data user profiling theory,and also includes popular recommendation and similarity recommendation functions,which constitute the three main functions of the system.The experiment demonstrates that the system has good performance.
Keywords/Search Tags:Recommender Systems, Deep Learning, Multimodal Data, Attention Mechanisms
PDF Full Text Request
Related items