Font Size: a A A

Short Videos Understanding Based On Deep Learning

Posted on:2020-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:X DongFull Text:PDF
GTID:2518306464487054Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the communication technology of network and the widespread use of mobile devices like smart phones and tablets,it has a short video platform represented by Tik-Tok and KuaiShou.Based on the application scenarios of short video,the work mainly focuses on scene recognition,action recognition and joint feature learning.The main work and achievements of the topics are as follows:In order to solve the problem of scene recognition in short video and the problems of blur in short video scenes,a deep fusion network based on VGGNet is proposed.VGGNet16 is used to learn global features,and VGGNet19 is used to learn image details.To solve the blur problems,the blur feature is extracted by using the deep fusion network and the blurred image is up-sampled and the similarity between the blurred image and the clear image is calculated by using the Euclidean distance loss to recreate the operation of removing the blurred image.In the 2017-AI-Challenger-scene-classification dataset,the result of top3 is 92.2%,and the top3 of the Charades short video dataset has achieved 78.9% of the results,which proves that the proposed method has a good effect and in addition,the proposed method has better robustness by recognizing the blurred image.In order to solve the problems of action recognition in short video,this paper first proposes a key frame extraction algorithm based on mutual information entropy,which uses sliding window to preserve the timing information between frames.Based on the key frame extraction,a based on Deform-GoogLeNet,a dual-stream CNN method for variable convolutional networks,uses the dual-stream network to extract the RGB features and optical flow characteristics of the image separately,and uses the weighted average method to obtain the results of behavior recognition.The result of Charades dataset is higher than the similar fusion algorithms,which proves that the proposed algorithm is effective.To further improve the action recognition in short video,a dictionary learning based scene feature and joint feature is proposed.Using dictionary learning and sparse representation methods can help the model find the significant features that enhance the effectiveness of action recognition,where the scene features can be the context information.The results of the experiments on the Charades datasets related to the kitchens indicates that the proposed action recognition algorithm combined with scene information is better than the single action recognition methods,which proves the proposed method.
Keywords/Search Tags:scene recognition, scene deblurring, action recognition, key frame extraction, joint learning
PDF Full Text Request
Related items