Font Size: a A A

Video Semantic Concept Detection Based On Multi Time Scale Two Stream CNN And Metric Learning

Posted on:2021-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2428330629486913Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the era of self-media brought by the rise of intelligent portable devices,users record,watch and share videos on the Internet,which has become one of the indispensable ways for people to express and transmit their emotions in their daily life.On the one hand,the video data active in daily life brings convenience to people,on the other hand,if the supervision is not strict,wanton dissemination of bad video content will also have a bad impact on the public,especially the young people.In the face of the increasing amount of multimedia data such as video and image on the Internet,how to identify video sequence content and realize video semantic concept modeling so as to reasonably classify video has become one of the hot research topics in the field of computer vision.It has been widely used in both civil and military fields,attracting the attention of many researchers at home and abroad.Based on the research of a large number of domestic and foreign literatures,this thesis first introduces the research background,significance and research status in the field of video semantic concept detection.Secondly,it introduces several deep learning network models,and briefly describes the related knowledge of video semantic concept detection technology.Aiming at the problems existing in video semantic concept detection technology,this thesis focuses on the research and development of video action semantic detection method based on multi time scale two-stream CNN and confidence fusion,and video semantic concept detection model based on multi time scale two-stream CNN and metric learning.In order to verify the practicability of the method proposed in this thesis in video semantic analysis task,video semantic analysis task is designed and implemented semantic concept detection prototype system.The main contents of this thesis are as follows:(1)In order to solve the problem of over dependence on background and appearance features,and the lack of ability to learn long sequence features due to the limitation of video length,considering the change of video sampling,the different speed of target subject motion,and the different confidence level of multiple action classifiers,a multi-time scale two stream CNN and confidence fusion method for video action semantic detection is proposed.In this method,two stream neural network is used to learn and extract the context information features between video frames of different time span on multiple time scales,and LSTM is used to predict the action categories of various features.Then,for each scale and mode of action classifier,it establishes the confidence level of category decision considering the overall difference and uniqueness of category semantics between the sample and other categories.Finally,it uses each classifier to fuse the confidence level and category score of action category decision to achieve action semantic detection.Experimental results show that the proposed method can effectively improve the accuracy of video action semantic detection.(2)In order to solve the problem of the diversity of video content and the impact of different environments,which results in large intra class gap and large inter class similarity.On the basis of the original model framework,combined with the measurement analysis of category differences between video semantic concepts,a multi time scale two stream CNN and metric learning video semantic detection method is proposed.In this method,the network model is trained by multi task learning,at the same time,training similarity measurement and semantic classification detection two subtasks.Feature learning is carried out by deep network and similarity between features is measured by metric learning to constrain and classify features.Distance between the features of video samples is calculated as the semantic difference degree through metric learning.The network updates the parameters by back propagation according to the difference degree of sample semantics,so that it can learn the difference between sample semantics.Experimental results on ucf101 show that the introducing of metric learning into multi time scale two-stream CNN can enhance the feature extraction ability of the network and further improve the accuracy of video semantic detection.(3)Using Python as the development language and PyQt as the image interface development environment,the prototype system of video semantic concept detection is designed and implemented.The system is composed of data preprocessing,model training,video semantic concept detection and other subsystems.The system interface is simple and friendly,and it has strong operability,which verifies the usability of the video semantic analysis method proposed in this thesis.
Keywords/Search Tags:Deep Learning, Video Semantic Concept Detection, Multiple Time Scales, Confidence, Metric Learning
PDF Full Text Request
Related items