Font Size: a A A

Video Semantic Concept Analysis Based On Manifold Embedding Two-stream Convolutional Neural Network

Posted on:2021-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiaFull Text:PDF
GTID:2428330623979535Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of multimedia technology,the deepening of smart city construction,and the promotion of portable smart terminal devices,video has gradually become an indispensable data carrier in daily life.With the increasing number of videos,complex and diverse video content brings great pressure to video retrieval,analysis and storage.Therefore,such a huge amount of video data promotes people to analyze the data from the data semantic level,and establish the semantic concept label of video data,so as to realize a fast and effective video retrieval and management method.How to effectively extract video features and realize semantic a detection of video has become a hotspot issue in the field of video supervision and retrieval.Based on a large number of domestic and foreign literatures,the research background and significance of video semantic concept analysis are first introduced,and the current status of research is briefly described.Then several deep learning models and video semantic concept analysis retrieval research applications based on deep learning methods are introduced.Analyzing the shortcomings of domestic and foreign research,this thesis proposes a manifold embedded convolutional neural network model,a video semantic concept analysis model based on manifold embedding and optical flow attention two-stream convolutional neural network.To verify the usability of the proposed model,a prototype video semantic concept detection system was designed and developed.The main research work of this thesis is as follows:(1)An image feature learning method based on manifold embedded convolutional neural network is proposed.Considering that the traditional image and video feature learning based on convolutional neural network lacks the learning of image neighborhood relationship and correlation features,and there are internal covariate shift,slow convergence speed and training difficulties in the training process of convolutional neural network.This thesis proposes an image feature learning method for manifold embedded convolutional neural network,this method introduces manifold constraints to the convolutional neural network,embeds the manifold of the previous layer into the convolution operation of the next layer,so that each layer of the convolutional neural network can effectively maintain the manifold structure of the previous layer,so as to obtain the video image feature expression that can reflect the image neighborhood relationship and correlation features.(2)A video semantic concept detection method based on manifold embedding and optical flow attention two-stream CNN is proposed.Considering that the spatial flow and optical flow features of video are highly complementary to each other,the feature fusion model of the two-stream network is firstly built,and the neighborhood relationship and correlation information between the features are mined by embedding the manifold into the spatial flow convolutional neural network.Then introduce the optical flow attention layer from the temporal flow network to the spatial flow network to guide the spatial stream to pay more attention to the foreground area of the human body and reduce the impact of background noise,so as to better obtain the changes and differences between the spatiotemporal features.Moreover,the features obtained by the two-stream CNN are input into the LSTM in time order to learn the temporal features.Finally,the confidence fusion of the classifier results of the two-stream is carried out,which can more effectively improve the accuracy of video semantic concept detection and the discriminability of feature learning.(3)By using Python,PyQt and other library packages,a prototype system for semantic detection of CNN video based on manifold embedding and optical stream attention is designed.The system has three sub-modules: video data preprocessing,model training and video semantic concept detection.The system provides a simple visual interface,easy for users to operate,complete functional module design,with good interactivity and availability.
Keywords/Search Tags:Deep Learning, Manifold Embedding, Convolutional Neural Network, Video semantic, Feature Fusion
PDF Full Text Request
Related items