Font Size: a A A

Research On Hybrid Video Recommendation Algorithm Based On Multi-Head Self Attention Mechanism

Posted on:2022-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z J C LiuFull Text:PDF
GTID:2518306347992919Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,deep learning has made great achievements in the field of artificial intelli-gence,which also provides new development opportunities and technological innovation for recommendation system.Thanks to the deep nonlinear network structure,deep learning can learn more abundant feature expression between users and projects from massive user behav-ior data.Compared with the traditional recommendation model,deep learning model can automatically capture the complex relationship within the data,and can mine the user and project features to obtain more complex and abstract High-order interactive feature repre-sentation.However,most of the current recommendation algorithms based on deep learning to build feature crossover will make the weight of each feature the same,unable to extract key information,and can not solve the problem of long-term and short-term preferences of users at the same time.In recent years,the attention mechanism theory proposed by scholars makes neural network focus on the important part of input features,and endows important features with higher authority.The model can not only capture the combination and inter-section of important features between users and projects,but also visualize the weight of each feature,so that the model has good interpretability.Therefore,in order to solve the above problems,this thesis proposes a recommendation model based on Multi-head Self-attention network.This model uses a hybrid recommen-dation model based on Multi-head Self-attention mechanism and two variants of recurrent neural network(Long and Short Term Memory Network and Gate Recurrence Unit)to cap-ture the interdependence and sequence of user feedback data for recommendation;Among them,the Multi-head Self-attention mechanism can give different weights to different feed-back data to capture key information,and construct High-order feature crossover.Recurrent neural network can accurately express users' long-term and short-term preferences.This thesis will study the recommendation system from the following four aspects(1)How to integrate heterogeneous data with multi sources into the recommended sys-tem.In the construction of commercial recommendation system,feature engineering plays a key role,because the information related to users and projects often contains very important features.Integrating these information into collaborative filtering model can improve the data sparsity,so that the accuracy of the model can be improved.These information related to users or projects are called auxiliary information,such as user portrait Statistics(age,gender,job,hobbies,education-Level)and project attributes(title,type,duration,cover)characteristics.Many kinds of data information are exchanged among users on the Inter-net,and the data sources are also diversified.For example,people can read news text data,video cover data,film scoring and other discrete data,which are heterogeneous.They have different data structures.Therefore,the key to integrate heterogeneous data into the recom-mendation model is how to extract features from different data structures reasonably,and how to model these features in the same model after extracting features.In the existing re-search,most of them use multi-layer fully connected neural network to learn the auxiliary information characteristics.Because of the complexity of the data heterogeneity,sparse data and uneven distribution of auxiliary information.In this thesis,a feature extraction method of mixed heterogeneous data is adopted,and different deep learning models are used for different data.For discrete data,the feature extraction is carried out by field embedding method.For text data,the feature is extracted by text convolution neural network.(2)How to construct the high-order feature intersection more efficiently.Feature inter-section is a method of synthesizing features,which can be carried out on multi-dimensional feature data.FM model can construct feature intersection by internal product operation of hidden variables of features.However,due to the complexity of calculation,only the Second-order feature intersection is used to obtain high order feature cross combination.Researchers in industry and academia usually construct high level feature intersection by using DNN,but there are still some defects.For example,the interaction between all attributes captured by using the full connected neural network,that is,the same weight of each feature element in the model will cause the model to not extract the key information.This thesis designed a Multi-head Self-attention layer that can automatically capture the attention scores of each element,and combine meaningful features automatically.Single feature may also involve different combination features.Therefore,this paper uses multiple headers to create dif-ferent subspaces and learn different cross features respectively,and finally get the learning combination features in all subspaces.(3)How to take the long-term preference of users into consideration.In view of the long-term preference of users for video,this airticle combines the Multi-head Self-attention mechanism with GRU.GRU and LSTM are two kinds of RNN and they are used to avoid gra-dient disappearance and consider User's and Item's short&long term memory.Moreover,The structure of GRU is simpler than LSTM,so the convergence speed of the recommen-dation model based on GRU is greatly improved compared with LSTM recommendation model,which is suitable for short-term model training and iteration.(4)Finally,a hybrid video recommendation model based on Multi-head Self-attention mechanism and GRU/LSTM are proposed,which combines heterogeneous data processing,high-order feature crossover and time series network.Specifically,the model firstly uses the method of field embedding to extract discrete data features,and uses Word2vec to process text data features,and then uses text convolution network to extract the data features of word vectors processed by Word2vec.Then,the input discrete and text features are mapped to the same position space,then the low-dimensional vector is input into the Multi-Head Self Attention neural network in interaction layer to generate the higher-order combination features and concat the output of them with the output of GRU network as the final result of interacting layer.This research is conducted on three common data sets,Movielens 100K,MovieLens 1M and MovieLens 20M.The performance of the proposed hybrid recommendation model MHA in all three datasets are higher than those of traditional recommendation models(LR,FM,AFM)and other recommendation models based on deep learning(NFM,Deep&Wide,DeepFM).According to the experiments,MHA-GRU has achieved the best results on Movielens-100K and the index AUC is 82.40%;Model MHA-LSTM has achieved best result on MovieLens-1M and the evaluation index AUC is 83.50%;From convergence time aspect,MHA-GRU is about 1/3 faster than MHA-LSTM in three datasets.Finally,three sets of hyper parameter experiments are designed,including the length of recommended list K,the number of po-tential vector dimentions N,and the number of self attention sublayer M.Besides,the model MHA-LSTM and MHA-GRU provid with good interpretability of attention mechanism.
Keywords/Search Tags:CTR prediction, Heterogeneous data, High-level feature interaction, Multi-head self attention, GRU
PDF Full Text Request
Related items