Font Size: a A A

Research And Implementation Of Video Thumbnail Recommendation Techniques Based On Visual Semantic Analysis

Posted on:2021-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:M Q ZhangFull Text:PDF
GTID:2518306572469354Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of deep learning,the connection between computer vision and natural language processing has become increasingly close,making visual understanding and language modeling an integral part of cross-modal information interaction,which makes the field of visual semantic analysis more and more concerned.Video captioning has become a hot research direction in the field of visual semantic analysis,which mainly studies the process of transforming visual information of video into semantic informatic of natural summary sentences.We integrate visual semantic analysis into video thumbnail recommendation,and propose a video thumbnail recommendation technology that integrates visual semantic analysis.It mainly analyzes the cross-modal interaction between visual information and semantic information to recommend video thumbnails that fuse semantic relevance of video summary sentences for video.Therefore,based on the research of video captioning technology,this paper will mainly study the video thumbnail recommendation technology that integrates semantic relevance.The main contributions of this paper are summarized as follows:(1)For the video captioning task,a method of fusing multimodal attention mechanisms is proposed,which can fuse the multimodal attention mechanism to provide richer multimodal features for different stages of the task.This method takes the multi-label classification attribute of video as the priori features,and integrates visual and attribute attention mechanisms to select the visual features and attribute features of current interest for the decoder when generating each word.In addition,A visual reconstruction framework incorporating semantic attention mechanism is established to reconstruct the generated summary sentences as visual features.The experimental results show that the proposed method can improve the performance of video captioning tasks,and has a strong competitiveness compared with the state-ofthe-art methods.(2)In order to analyze the cross-modal interaction between the visual information of the video frame and the semantic information of the video summary sentence,a video thumbnail recommendation framework based on deep visual semantic embedding is proposed in this paper.This framework constructs a deep visual semantic embedding model,which can embed the visual information of video frames and the semantic information of description sentence into the potential space with the same dimension and representation form.In this space,the key frame corresponding to the visual feature with the highest cosine similarity to the semantic feature is selected as the key thumbnail.The experimental results show that the framework can effectively recommend video thumbnail sequences from the video that are visually representative and correlated with the semantics of a given description sentence.(3)In order to express the degree of correlation between semantic information and visual information more directly,a video thumbnail recommendation framework based on visual semantic attention mechanism is proposed in this paper.This framework constructs a visual semantic attention model,which can calculate the attention weight of the semantic feature of current summary sentence to all the visual features of the key frames through the attention mechanism,and select the key frame corresponding to the visual feature with the largest weight as the key thumbnail.The experimental results show that the framework can recommend video thumbnail sequences with both visual and semantic representation,and it is better than the first video thumbnail recommendation framework proposed in this paper.In order to verify the application effect of the above three algorithms in video thumbnail recommendation task,this paper designs and implements a video thumbnail recommendation system integrating visual semantic analysis,and implements automatic video captioning function,custom video thumbnail recommendation function,and automatic video thumbnail recommendation function that can automatically fuse the semantic information of the video summary sentence.
Keywords/Search Tags:visual semantic analysis, video captioning, video thumbnail recommendation, attention mechanism, visual semantic embedding
PDF Full Text Request
Related items