Font Size: a A A

Research On Video Semantic Concept Detection Method Based On Deep Learning With Topographic Sparse Encode

Posted on:2019-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ChengFull Text:PDF
GTID:2428330566472823Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of Internet and multimedia technology,the transmission of video data has become more convenient with increasingly diverse sources,which led to a rapid increase of video data on the Internet.As an important data source on the Internet,video contains more complex and abundant information than other types of data.It is so rich video data that people need to analyze it in the semantic level and establish a semantic concept label of the video to achieve more efficient management and retrieval of data.It has become a hot issue in the field of video management and retrieval to study how to realize video semantic concept detection and modeling by efficiently learning and using video data features.Based on a large number of domestic and foreign research papers,this thesis first introduces the research background,significance and current situation of video semantic concept detection.Then it briefly introduces several deep learning models and the application of video semantics concept detection based on deep learning.This thesis analyzes the shortcomings in previous studies and proposes video semantic analysis based on topographic sparse pre-training convolutional neural network,video semantic analysis based on graph regularization optimized deep neural network and a video semantic concept detection prototype system.The main contents of this thesis are as follows:(1)A video semantic analysis based on topographic sparse pre-training convolutional neural network is proposed.In the previous studies,the video feature learning model based on convolutional neural network does not consider the problem of learning topographic information of video images.This method proposed in this thesis adds a topographic constraint to sparse autoencoder and presents a new topographic sparse encoder for neural network pre-training,so that the characteristic expression of video images can reflect the topographic information of the images,then the model constructs the fully connected layer for video feature learning.Video label information is used for fine-tuning networks in fully connected layers of image feature learning and video feature learning separately to obtain more reasonable and effective video features.(2)A video semantic analysis based on graph regularized deep neural network optimization is proposed.For the video semantic analysis based on deep neural networks,the fully connected layers rely on logistic regression to learn discriminative features which leads to the lack of consideration and underutilization of discriminability.It is necessary to consider how to utilize the characteristics of input data to improve the discriminability of features extracted by deep neural network further.In this thesis,a video semantic analysis method based on graph regularization optimization deep neural network is proposed.This method proposes an improvement to tradition graph regularized autoencoder with locality-preserving property.The single graph regularization term constructed by considering the neighborhood relationship only is improved to build two graph regularization terms constructed with consideration of both the category relationship and the neighbor relationship.A graph regularized autoencoder with discriminative loss is presented and used to obtain the video discriminative feature by dimensionality reduction to optimize the feature,which come from fully connected layers of the deep convolutional neural network after logistic regression fine-tuning.This method will further improve the accuracy of video semantics concept detection.(3)A prototype system for video semantics detection based on topographic sparse encode deep learning is designed and implemented using python and related additional library with object-oriented design philosophy.The system includes three modules: video data preprocessing,model training and semantic detection.The prototype system provides compact interface,operation convenience and verifies the feasibility of our method.
Keywords/Search Tags:Video Semantic, Convolutional Neural Network, Deep Learning, New Topographic Sparse Encoder, Improved Graph Regularization
PDF Full Text Request
Related items