Font Size: a A A

The Key Technology Research Of Short Video Multimodal Retrieval Based On Deep Learning Technology

Posted on:2022-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:L J WangFull Text:PDF
GTID:2518306329498924Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Deep learning which has been vigorously developed in language translation,image recognition and other fields,has been widely applied in daily life,such as machine translation,face recognition and so on.With the arrival of 5G era,the restriction of network speed on people is getting smaller and smaller.The convenience of people to produce and publish videos promotes the massive growth of video resources in the network.How to use deep learning algorithm to learn the feature presentation of video information and implement short video multi-modal retrieval has always been a research field concerned by the industry.Different from artificial motion features,deep learning method performs well in active learning image features,which provides a new perspective and processing method for human motion recognition technology.In this paper,RGB sequence and optical flow sequence,which are used as the input of the network model,are often used as the input information for deep learning to process motion recognition problems.The 3D network is used to learn the inter_frame information of the video which the image and optical flow information are respectively fused from the input into the network,so as to better express the concept of continuous space and time of action.In the process of model analysis and training in this paper,the main work is as follows: firstly,UCF-101 data set of 13,320 videos,which is used as training data.is used to conduct frame split and optical flow calculation.The total of 3.7 million RGB and optical flow pictures are preprocessed,which is convenient for later model training.The training set was split according to the split-1 method of UCF-101 data set,and the model was trained and fine-tuned to achieve 93.47% accuracy.In this paper,the video action recognition task is accomplished by training the Two-Stream Inflated3 D Convolution Network(I3D).After getting a well-behaved video action recognition model,this paper implements the video retrieval web system by Django module of Python.The test data set is used to simulate the user’s video data,and the motion recognition network and target detection network are used to automatically annotate the data,and finally it is added into the My SQL database.This system implements multimodal short video retrieval by deep learning end-to-end presentation technology.The main implementation includes: 1)retrieval a cluster of short videos by given the similar short video;2)The retrieval function of short video is realized through a text or keyword description while the text annotation of the video is obtained by combination of object detection results and motion recognition results;3)Error recognition videos can be collected in the system by user’s artificial annotation to improve and optimize model training accuracy in the future.
Keywords/Search Tags:Motion recognition, Two-Stream Inflated 3D Convolution Network(I3D), Django, MySQL, Optical flow
PDF Full Text Request
Related items