Font Size: a A A

Research On Image And Video Description Methods Based On Deep Learning

Posted on:2019-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:L B CaoFull Text:PDF
GTID:2428330551958717Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of technologies such as data compression,communication and storage,the image data and video data are increasing day by day,thus effective management and use are required.Therefore,an emerging technology utilizing data mining technology to process image and video data-"video data mining" came into being.Image description and video description are important contents of video data mining,and they are called descriptive video data mining.In addition,image description and video description are the key and difficult points in the field of computer vision and natural language processing,and they have wide application prospects.Aiming at the shortcomings that current image and video description accuracy is low,this paper starts from improving the accuracy of image and video description,and uses relevant methods in the field of deep learning to design the image description framework and video description framework.For describing images and videos in natural language,the convolutional neural network correlation models are employed to extract features of single image and multiple images in video and the word vector model is utilized to process word sequences.The image description method based on continuous Skip-gram and deep learning is studied.In order to further improve the accuracy of image description,the continuous Skip-gram model is introduced into the framework of generating image description.First of all,continuous Skip-gram model is employed in the frame to learn the distributed representation of words,thus high quality word vectors are obtained and it reduces the computational complexity of word vectors.Then,Region-based Convolutional Neural Network in the frame detects image objections and extracts features.Finally,the word vectors and image features are utilized as input and bias of Recurrent Neural Network to generate image description.Compared with three kinds of image description models,the frame using continuous Skip-gram model improves the description accuracy and its generalization ability.A video description method based on deep transfer learning is studied and a new video description model is constructed.Based on the existing video description framework,this model makes use of the deep domain adaptation method in transfer learning to achieve the deep fusion of semantic features from image and frame stream.Importing fused semantic features to the video description framework,and combining with video input and recursive neural network,the proposed model can generate natural language description of video.Compared with the existing seven models,deep domain adaptation method used to fuse semantic features of different domains can further improves the performance of the video description.
Keywords/Search Tags:Image description, Video description, Deep learning, Continuous skip-gram, Deep domain adaption
PDF Full Text Request
Related items