Font Size: a A A

The Study Of Video's Affective Content Analysis Based On Generalized Protagonist And Spatiotemporal Information

Posted on:2020-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:M TongFull Text:PDF
GTID:2428330599954711Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the advent of the intelligent age,the task of emotion recognition has become a hot issue for many scholars.Studying emotion recognition can not only optimize the experience of human-computer interaction,but also has broad application prospects in personalized recommendations,health supervision,interactive games and so on.Video,as a widely used information carrier,contains a lot of emotional information.This prompted some scholars to have an interest in the issue of emotional recognition in video.The popularity of cameras and mobile devices has led to a sharp increase in the amount of video data.The early manual methods of manually labeling the emotional content of video have not adapted to the speed of video data growth.Automatically identifying the emotional content of video has become an urgent demand.How to extract the appropriate features from the audio modality and visual modality of the video to characterize the emotional content of the video is a major difficulty in the video emotional content analysis task.In order to solve the above difficulties,two methods of video affective analysis are proposed in this dissertation.(1)Based on manual features and deep learning,this dissertation proposes a method based on generalized protagonist for video affective content analysis.We observed that different characters in the video play different roles in the video,and important characters play an important role in the emotional expression of the video.Different from other scholars considering all faces in the video,we select the characters that we think play an important role in the emotional expression of the video according to certain criteria in many roles.We call that important character the generalized protagonist.The keyframes are extracted based on the generalized protagonist and the corresponding optical flow images are obtained.Then,we extract the spatial and temporal features from the keyframes and optical flow images by using convolution neural networks respectively.At the same time,the manual audio-visual features commonly used in video,such as Zerocross Rate,Mel-Frequency Cepstral Coefficients and colors,are extracted as a supplement.Finally,the spatial and temporal features extracted by the network are combined with the manual features to map the video emotions,to analyze the emotional information contained in the video.(2)In addition,based on deep learning,this dissertation establishes a multi-modal hybrid neural network framework based on spatio-temporal domain information to capture emotional content in video.The framework consists of two modes,one is a visual modality and the other is an audio modality.For the visual modality,we use the R2plus1 d network to extract the temporal and spatial information of the video to analyze the emotional content of the video.The audio modality uses a one-dimensional audio signal to be converted into a two-dimensional log-Mel spectrogram as input and is sent to the visual geometry group(VGG)network for feature extraction with temporal domain information.Finally,the features of the two modalities are fused through the deep belief network(DBN)network to map the emotions in the video.Experiments show that the proposed framework can effectively extract the audio-visual spatio-temporal domain information in the video and show good performance in the video emotion recognition task.
Keywords/Search Tags:Emotion Recognition, Generalized Protagonist, Spatio-temporal Information, Deep Learning, Multimodality Fusion
PDF Full Text Request
Related items