| With the advent of the era of big data and the explosive growth of multimedia data,video event analysis has become a hotspot of current research.It has broad application prospects and great economic value in many fields such as smart city and security.Video event analysis is extremely challenging due to complex backgrounds,diverse perspectives,and large differences in content in the same category.In recent years,deep learning has made great achievements in the field of multimedia,but deep learning models require a number of high-quality data with labels,and existing models and methods are difficult to meet the needs.The existing video event analysis methods rely on the tagged data,and fail to utilize the semantic concept information carried by the video itself.To solve this problem,we propose a video event analysis method combined with semantic concept and two-stream feature model and video event analysis method combined with semantic concept and spatial-temporal attention model.The main contents of this thesis are as follows:(1)We propose a video event analysis method combined with semantic concept and two-stream feature model.First,a preferred concepts subset generation method for tasks is proposed to construct a video event detection model based on semantic concept.Meanwhile,a two-stream feature convolution neural network and LSTM video event detection model for optical flow images and spatial flow sequence are constructed,and then the dual stream event analysis results are fused and classified.Finally,the event classification analysis results,based on the semantic concept and the two-stream features respectively,are merged with decision-making to detect video events.The experimental results show that the method can effectively utilize the semantic concept information contained in the video to improve the accuracy of video event analysis.(2)We propose a video event analysis method combined with semantic concept and spatial-temporal attention model.In order to reduce the influence of irrelevant information on video event analysis,the two-stream feature model is improved based on the original framework.First,CNN is used to process RGB images and continuous optical flow images to obtain spatial and temporal features,and input into spatial attention network and temporal attention network for spatial and temporal attention weighting.Then,the weighted dual stream features are fused and classified to obtain the event classification analysis results based on the spatial-temporal attention.Finally,the results of event classification analysis based on semantic concept and the results of event classification analysis based on spatio-temporal attention are fused as the final detection results.The experimental results show that the proposed method can further improve the accuracy of video event analysis.(3)Design and implement video event analysis prototype system combining semantic concept and spatio-temporal attention model.Based on the design concept of program modularization,the system algorithms are implemented by Python,Tensorflow and Numpy,and the simple system GUI interface is designed by PyQt.The system includes four functional modules: data preprocessing,semantic concept extraction,model training and video detection,which verifies the availability of the video event analysis method proposed in this thesis. |