Font Size: a A A

Researches On Semantic-and Interest-Based Image/Video Retrieval And Authentication

Posted on:2021-01-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LiFull Text:PDF
GTID:1368330602966035Subject:Management of engineering and industrial engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,social media and mobile multimedia terminals,multimedia data including text,image,audio,video,etc.are increasingly integrated into people's life,work and study.In particular,the generation,acquisition,processing and dissemination of multimedia data,especially image and video,are simplified and popularized along with the increase of multimedia applications.The media that people derive information from has changed from the traditional text to the diversified modalities.In the current information society,images and videos contain more and more information,and image and video retrieval is becoming increasingly popular in people's daily life.Since 1970 s,image and video retrieval has been a hot topic in the theoretical and practical fields of information retrieval.In recent years,with the vigorous developments of multimedia technology and various multimedia applications,users' requirements on retrieval system with good performance and experience has been constantly improved.The retrieval has been required to be accurate,efficient,personalized and trusted by users gradually.In a whole retrieval procedure,the user should provide a query to the retrieval system,the retrieval system should search the information matching to the query and give the user a feedback.In this pipeline,any information loss and mismatch in a part will degrade the retrieval performance.Therefore,in this thesis,the "semantic gap" between search engines and data,the "intention gap" between the user and the query and the "trust gap" between the data(or search engine)and the user are studied in order to improve the performance of image and video retrieval.And the above three "gaps" are studied and the corresponding solutions are proposed in this thesis.In order to satisfy the user in information retrieval,the corresponding study should cover the range from "semantic gap" to "intention gap" and then to "trust gap".The "semantic gap" refers to the distance between the visual features of images and videos and the users' semantic understanding.The purpose of bridging the "semantic gap" is to accurately and efficiently retrieve the results which match the query.Along with the development of personalized service,the users' expect on retrieval is extended "intention",that is to say,the users wants to obtain the search results which meet their search intention or interest.However,there is a gap between the users' search intention and the query,which is defined as "intention gap".Therefore,bridging the "intention gap" is the key to achieve personalized retrieval.At the same time,the developments of editing tools for images and videos bring about more and more incidents of the forgery and tampering of images and videos.There is a gap between people's trust in retrieval results and the credibility of the results,namely "trust gap".For the retrieval system,it is necessary to verify the credibility of the retrieval results.This thesis focuses on "semantic gap","intention gap" and "trust gap" in image and video retrieval.The main works and contributions are listed as follows.1.In the aspect of semantic gap,video copy detection is studied and a 3D CNN-based video copy detection method is proposed,because the method of video copy detection usually has stronger abilities in semantic representation and content discrimination.In this method,3D convolutional neural network(CNN)is used to capture both spatial and temporal features of video,and the complexity of 3D CNN training process is reduced and the shortage of data resources is solved.In order to reduce the difficulty and computational complexity of network construction and reduce the requirement of hardware configuration,a parallel architecture composed of 3D CNN is proposed.It decomposes multiple classification tasks into the combination of multiple binary classification tasks.Since each 3D CNN in the parallel architecture is only used as a binary classifier,the training difficulty and data volume requirements of CNN are greatly reduced.In addition,the network structure of parallel 3D CNN has the ability to classify data of unknown categories and can be extended with the addition of new categories.In order to solve the problem of insufficient data resources,the segmentation method using equal interval sampling is adopted for augmentation to ensure that each video segment can maximize the presentation of video content.In the test phase,a high recognition rate can be achieved via only a few video segments,which greatly speeds up the recognition and provides a reference for the real-time processing of video classification.Experiments show that the proposed method is effective in copy detection and the extracted video features have strongsemantic expression.2.In the aspect of intention gap,a method of users' interest computing based on movie recommendation is proposed since the movie recommendation system is highly dependent on users' intention and interest.According to the characteristics of IMDB in the form of image and text,a cross-media learning method is used to extract the movie feature from the common space.In users' interest computing,a time factor is introduced on the basis of user rating matrix,and user interest is initialized on the basis of the influence of long-term and short-term user interest.Then,the feature vector of the movie is iterated with the initial users' interest vector to obtain the optimized users' interest.Finally,according to the user interest,movie recommendation is implemented through the rating prediction mechanism based on the collaborative filtering of users,and the performance of movie recommendation is used to verify the user interest.The experiments on the Movielens dataset shows that the proposed method improves the accuracy and speeds up the convergence of the user interest.It is promising in alleviating the "intention gap" problem in image and video retrieval and recommendation.3.In the aspect of trust gap,an authentication watermark scheme is proposed.Given that the human visual system(HVS)has a different perception sensitivity in the different orientation of the image,a compound orientation feature map is computed.It contains the vertical,horizontal,and diagonal information for directional patch extraction,and is used to construct the visual saliency map for just noticeable distortion(JND)optimization.First,the DC and three low-frequency coefficients of each block are used to calculate the luminance and texture feature maps,respectively.Then,the feature map based on the complex directional feature is constructed according to three low-frequency coefficients.Finally,a linear fusion of the three different visual feature map is taken as the visual saliency map and used to optimize the JND model.The optimized JND model is employed in watermarking to achieve better perceptual quality of the watermarked image.Experimental results show that the proposed watermarking scheme has good performance in authentication.
Keywords/Search Tags:Image Retrieval, Video Retrieval, Semantic Gap, Intention Gap, Trust Gap, Copy Detection, Movie Recommendation, Digital Watermark
PDF Full Text Request
Related items