The Key Technology Research Of Short Video Multimodal Retrieval Based On Deep Learning Technology

Posted on:2022-03-20

Degree:Master

Type:Thesis

Country:China

Candidate:L J Wang

Full Text:PDF

GTID:2518306329498924

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Deep learning which has been vigorously developed in language translation,image recognition and other fields,has been widely applied in daily life,such as machine translation,face recognition and so on.With the arrival of 5G era,the restriction of network speed on people is getting smaller and smaller.The convenience of people to produce and publish videos promotes the massive growth of video resources in the network.How to use deep learning algorithm to learn the feature presentation of video information and implement short video multi-modal retrieval has always been a research field concerned by the industry.Different from artificial motion features,deep learning method performs well in active learning image features,which provides a new perspective and processing method for human motion recognition technology.In this paper,RGB sequence and optical flow sequence,which are used as the input of the network model,are often used as the input information for deep learning to process motion recognition problems.The 3D network is used to learn the inter＿frame information of the video which the image and optical flow information are respectively fused from the input into the network,so as to better express the concept of continuous space and time of action.In the process of model analysis and training in this paper,the main work is as follows: firstly,UCF-101 data set of 13,320 videos,which is used as training data.is used to conduct frame split and optical flow calculation.The total of 3.7 million RGB and optical flow pictures are preprocessed,which is convenient for later model training.The training set was split according to the split-1 method of UCF-101 data set,and the model was trained and fine-tuned to achieve 93.47% accuracy.In this paper,the video action recognition task is accomplished by training the Two-Stream Inflated3 D Convolution Network(I3D).After getting a well-behaved video action recognition model,this paper implements the video retrieval web system by Django module of Python.The test data set is used to simulate the user’s video data,and the motion recognition network and target detection network are used to automatically annotate the data,and finally it is added into the My SQL database.This system implements multimodal short video retrieval by deep learning end-to-end presentation technology.The main implementation includes: 1)retrieval a cluster of short videos by given the similar short video;2)The retrieval function of short video is realized through a text or keyword description while the text annotation of the video is obtained by combination of object detection results and motion recognition results;3)Error recognition videos can be collected in the system by user’s artificial annotation to improve and optimize model training accuracy in the future.

Keywords/Search Tags:

Motion recognition, Two-Stream Inflated 3D Convolution Network(I3D), Django, MySQL, Optical flow

PDF Full Text Request

Related items

1	Action Recognition Method Based On Motion Vector Prediction
2	Research On Convolution Neural Network Behavior Recognition Based On Optical Flow Characteristic
3	Research And Implementation Of Optical Flow Estimation And Motion Segmentation Technology Based On Asynchronous Event Stream And Traditional Image Fusion Through Deep Learning Network
4	Research On Human Motion Recognition Based On Image
5	Multi-branch Deep 3-Dimensial Convolution Neural Network For Human Action Recognition In Videos
6	Research On Micro-Expression Recognition Based On Convolutional Neural Network
7	The Recognition Of Hand Gesture Motion Behavior Based On Deep Learning
8	Research On Action Recognition Method Based On Multi-stream Network
9	Research For Action Recognition Based On Spatial-Temporal Stream Convolution Neural Networks
10	Calculated Typical Behavior Recognition Algorithm Based On Optical Flow