User generated videos refer to videos taken by users in their daily lives through mobile devices and then uploaded to the internet,used to share their daily lives or express their opinions and opinions on certain events.With the popularization of intelligent devices and the rapid development of Internet technology,social media and video sharing websites such as You Tube,Station B,Tiktok,Kwai are gradually changing people’s lives.More and more people are sharing their own videos on these platforms,resulting in large-scale user generated video data.Conducting sentiment analysis on it is not only beneficial for predicting the extreme emotions of the subject object to prevent unexpected events,but also for the government and platforms to monitor online public opinion and understand the public’s values and emotional state.This paper focuses on the sentiment analysis of user generated videos.The main research contents and contributions are as follows:(1)Research on user generated video sentiment analysis that combines scene information and attention mechanisms.Existing research on user generated video sentiment analysis rarely utilizes scene information in videos,and mostly ignores the correlation and interactivity between the spatial,channel,and temporal dimensions of videos.A user generated video sentiment analysis model based on scene information and attention mechanism is proposed to address these issues.This model extracts rich scene semantic features,enhances the expression ability of emotional features in videos,and enables a more comprehensive and accurate understanding and analysis of the sentiment contained in videos;Propose spatial-channel residual attention to focus the model on the impact of different spatial regions on channel information in the video;propose channel-temporal residual attention to focus the model on the impact of different video channels on temporal frames.The experimental results on two public datasets show that the proposed model can effectively improve the accuracy of user generated video sentiment analysis.(2)Research on user generated video sentiment analysis based on Automatic Speech Recognition(ASR)transcription technology.Most of the existing research on user generated video sentiment analysis uses manually annotated real text.However,with the massive increase in the number of videos,manually annotated real text requires a lot of manpower and time,making it difficult to meet practical needs.ASR can automatically convert audio information in videos into text form,but due to factors such as complex voice signals,noise interference,and speaker’s pronunciation,the transcribed text is not accurate enough.Directly using this text can affect the accuracy of sentiment analysis models.A multi-task sentiment analysis model based on ASR transcription technology for user generated video is proposed to address the above issues.This model converts audio information in videos into text format through the ASR transcription module,improving the efficiency of obtaining text data;at the same time,a multi task learning method is proposed,which combines video sentiment analysis tasks and ASR text sentiment analysis tasks to enable the model to learn more generalized features and better adapt to ASR transcription data containing noise and errors,thereby improving the accuracy of sentiment analysis.The experimental results on two public datasets show that the proposed model can effectively achieve higher accuracy in user generated video sentiment analysis using ASR transcribed text. |