Font Size: a A A

Research On Text And Image Information Fusion In Cross-media Retrieval

Posted on:2018-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:J F ZhaoFull Text:PDF
GTID:2348330518985889Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,multimedia technology,compression encoding technology,the capacity of the hardware storage has got a great improvement.And a large amount of multimedia data can be stored on the network providing more information in comparison with single media.How to accurately and effectively identify the relevant multimedia data in accordance with the needs of users in the vast ocean of data? The “semantic gap”problem exists in a wide range of current retrieval system.Two images different in the visual features belong to the same theme;or two images similar in the visual features are under different themes.In order to solve the “semantic gap” problem,many scholars put forward their own cross-media retrieval model in recent years.Aiming to separate the top semantics out of multimedia data,they build fusion model to fuse feature among different media,using the existing text,images and video feature extraction algorithm and feature reduction method.In the feature extraction,the video is equivalent to the image after the video scene segmentation and key frame extraction process;and the audio is equivalent to text after the audio scene recognition and voice recognition process.Therefore this paper focuses on the text and image information fusion.This paper first introduces the development of cross-media retrieval,cross-media retrieval feature extraction,including text and image contributed feature extraction method for cross-media search.Then we introduce the main innovation work of this paper,two cross-media hash search model,based on word-word similarity matrix and cosine distance loss function,based on convolution neural network using the depth learning model and natural language processing technology to integrate the text information into the image retrieval process to complete the cross-media retrieval task.In this paper,we use the open source framework Caffe and natural language processing tools NLTK for experiment simulation.The image convolution feature is combined with the text word vector and the word-map correspondence matrix as the input of the neural network,designing the fusion hash code as a training target.Using the fast hash search technique,the text and image are mapped to the binary hash code.Two models respectively designed a fusion hash code,a loss function.The fusionhash code is designed so that the corresponding text and the corresponding image can be obtained by the neural network.Experiments show that the two method can complete the cross-media task to map the search text to the image search function and the search image to text search function.
Keywords/Search Tags:Cross media, Depth learning, Information fusion, Data retrieval
PDF Full Text Request
Related items