Task-driven Visual Media Text Description Technology

Posted on:2020-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:M Gao

Full Text:PDF

GTID:2438330575959491

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Because of the exponential growth of personal data collected by people,the amount of image and video data also increases.Compared with text,people now widely use text with images or videos to record life.However,because of the large amount of image or video data,when uploading images and videos on various social software,people can not quickly and accurately find interested images or video clips.To meet the above requirements,a cross-modal video diary retrieval method based on video captioning model is proposed in this thesis.By automatically generating natural language descriptions through analyzing video content,the cross-modal conversion between video and text is realized,which helps people retrieve the video clips needed in the huge video database.In addition,aiming at the effect of image resolution on image captioning,an improved image super-resolution reconstruction algorithm based on cascaded residual learning convolution neural network is proposed in this thesis,which applies superresolution image to image captioning and improves the accuracy of image captioning.1)A retrieval algorithm for text and video diaries based on video captioning is proposed in this thesis,which consists of three processes.Video shot segmentation.Video shot segmentation method based on wavelet transform can segment video adaptively and detect shot boundaries better.So in this thesis,the method of video shot segmentation based on wavelet transform is used.Firstly,the brightness difference between video frames is decomposed by wavelet multi-resolution,then the modulus maxima point is obtained by denoising,and finally the shot boundary is found by tracking the modulus maxima point,thus the video is segmented into short video clips with different scenes.Video captioning.In this thesis,a caption-guided visual saliency automatic description method is used.This method reveals the mapping relationship between image regions and words in modern encoder-decoder networks.It is implicitly learned from the caption training data,and can generate temporal or spatial heatmaps for predicted captions or arbitrary query sentences.Vector representation of text.In order to represent video description and diary description with fixed length vectors.In this thesis,we use an unsupervised algorithm to learn fixed length feature representation from variable length text,which overcomes the disadvantage of bag-of-words model's disorder and lack of semantic information.2)An improved image super-resolution reconstruction algorithm based on cascaded residual learning convolution neural network is proposed.In the process of image restoration,some high-frequency components can not be restored from low-resolution image to highresolution image through existing convolution neural network(CNN)based methods.Therefore,an improved image super-resolution method based on cascaded residual learning convolution neural network is proposed in this thesis.In this method,the sum of the high resolution image restored by the first residual learning network and the residual estimation image is taken as the input of the second residual learning network,and the unrecoverable residual components are learned again.Moreover,image super-resolution is applied to image captioning,and improves the accuracy of image captioning.In this thesis,the video captioning model is applied to the actual problem of user's text diary retrieval video diary,and investigates the experimenter's satisfaction with the matching of video diary,and most people express their satisfaction.In addition,the image superresolution method is applied to image captioning in this thesis.The results show that when the image resolution is low,improvement of resolution will also obviously improve the accuracy of image generation description.

Keywords/Search Tags:

Video Segmentation, Video Captioning, Vector Representation of Text, Lifelog Video, Super-Resolution

PDF Full Text Request

Related items

1	Research And Implementation Of Long Video Captioning Technology Based On Deep Learning
2	Reg ANR-based Image Super-resolution Algorithm Research On Its Application In Video Coding
3	Research On Sparse Representation Based Video Super-resolution Reconstruction Algorithm
4	Study On Key Technology Of Representations For Multiview Mixed Resolution Video Based On Super Resolution
5	A Theoretical Framework And Implementation For Video Segmentation
6	Research On Face Super-resolution Algorithm Based On Video Stream
7	A Gpu Accelerated Algorithm For Compressive Sensing Based Video Super-resolution
8	Video Super-resolution Reconstruction Technique And Its Application
9	Research On Video OCR
10	Research On Video Super-resolution Technology