Font Size: a A A

Research On Multimodal Data Processing Algorithm Based On Deep Learning

Posted on:2020-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y YanFull Text:PDF
GTID:2518306518464994Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer networks and mobile communication devices,social networks,as a carrier of mutual communication,occupy a vital part in people's daily lives.Social networks have rich data containers such as text,images,and videos,they can all be called as a tool for data transmission.More and more people tend to get an evaluation of something from different social networks,such as getting comments on a storefront from the comments of a group purchase application,and obtaining product information from the evaluation of products in the video.Therefore,processing multi-modal data,and obtaining useful information from the multi-modal data to obtain emotional information,is a very worthwhile direction to concentrate on,and can be effectively applied to many applications,such as product recommendation,travel recommendation,entertainment recommendations,etc.In this paper,we first introduces the background related to multimodal data processing and the related research results in this direction.On the basis of previous studies,deep learning is often used to process multi-modal data of complex scenes and applied to the recognition of video sentiment semantics.In this paper,the main studies of this subject contains two parts:(1)The first part is to study the multi-modal information fusion,build a multi-layer LSTM network,fuse multi-modal data and output the characteristics of the discourse,and then use the traditional LSTM model.Then using traditional LSTM models to extract video features and perform emotional semantic recognition.(2)Based on the emotion recognition,we constructed a venue recognition model,trained different emotion detectors on the venue multimedia information dataset collected by Twitter,and then multi-moded the pre-trained detection model to the venue.The comprehensive emotional state assessment is given in the state information,and finally got the prediction results of different modalities on the emotional label of the venue.The evaluation database in this paper is divided into two categories according to the work performed.The database used in the video sentiment analysis is the MOSI data set.Among them,93 people published their opinions in different things in English.The video of the data set is divided into small segments,in which each short segment of video is divided into-3(most negative)to +3(most positive)emotional score intervals for its emotional characteristics.The MOUD dataset is also a dataset for sentiment analysis,but the video clips in it are Spanish,but we use Google Translate to translate it into English,and the emotional markers have positive,neutral,and negative descriptions;Secondly,in the venue sentiment analysis based on video sentiment analysis,the data set used here is the text,image and video information from Twitter on the location of some places in Singapore near the Universal Studios,and we chosed venues which are more often commented or visited.The site sentiment model proposed in the paper is used to predict the emotions of these popular venues,and the user is invited to evaluate the emotions of these venues as ground truth,thus verifying the usability and superiority of our site sentiment analysis based on multimodal data processing.
Keywords/Search Tags:multimodal data, deep learning, LSTM, emotion recognition, feature fusion
PDF Full Text Request
Related items