Font Size: a A A

Research On Video-based Person-Scene Instance Search

Posted on:2020-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:J M LanFull Text:PDF
GTID:2428330590477045Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the revolution of science and technology,computer technology has developed rapidly.Digital technology has undergone qualitative changes at the same time.Driven by the wave of technology,we have entered a new era dominated by multimedia such as audio,image and video.With the popularity of the network,the total amount of global data has exploded and more than half of them is video data.How to efficiently retrieve content of interest from massive video data has always been an urgent problem to be solved.In 2010,Instance Search(INS),which aims to find video data quickly and accurately,was put on TRECVID,which attracted wide attention from domestic and foreign research scholars.In 2010-2015,TRECVID's INS focused on the retrieval of a single specific target,and research institutions from all over the world actively participated in the evaluation and achieved very good results.In 2016,TRECVID's INS proposed Person-Scene Instance Search(P-S INS),to search specific persons in a specific locations.Since the P-S INS has been proposed,research scholars have conducted relevant research and made some progress.However,there are still many shortcomings in the current methods for P-S INS:(1)The visual features of the person or scene are not robust.Most methods are based on the visual features of the query image.However,on the one hand,the target scene has angle transformation,illumination change,occlusion,etc.On the other hand,the target person has posture change,clothing appearance change,etc.Thus,The visual feature retrieval target becomes very difficult,and it is difficult to meet the retrieval requirements of complex application scenarios.(2)The fusion results of persons and scenes are not effective.To obtain the results of P-S INS,it is common that people merge the results of person retrieval and scene retrieval.However,the retrieval results of people and scenes interact with each other.When the camera focused on a person,the scene becomes blurred or blocked by a large area.And when the camera uses a wide-angle lens,the features of the scene are very rich but the persons become very small.This directly leads to poor retrieval performance.(3)The current P-S INS system is limited.The amount of video data is very large,but there are very few videos that meet users' requirements.It is very difficult to find those targets from massive videos.It is of great practical significance to design and implement an efficient video retrieval system for P-S INS.To address above problems,in this paper,extensive research is conducted on P-S INS.The contents of this paper are as follows:(1)Person retrieval and scene retrieval methods based on comprehensive feature representationImage visual features are affected due to changes in perspective and occlusion of people and scenes in the video dataset.To this end,this paper proposes the feature representation based on different methods.By collecting the facial features of different poses and combining scene features at different angles,different implementation techniques are used to construct more robust feature representation.Especially for the scene,on the one hand,focusing on the scene under the close-up,we indirectly retrieve the scene through the scene marker based on the BoW model.On the other hand,focusing on the scene under the wide-angle lens,we directly retrieve the scene based on the CNN model.The organic combination of these two hands effectively improves the accuracy of scene retrieval.(2)Video-based instance search method based on nearest neighbor weighted rank optimizationMost existing methods usually combine the person and scene retrieval results for P-S INS by direct fusion methods such as multiplying similarity scores.However,we find that persons and scenes are often in a dilemma situation.The similarity scores of the two are not high at the same time so that the search results based on direct fusion are not ideal.To this end,this paper proposes a rank optimization method based on neighbor shots.Through an optimized ranking method based on unsupervised metric learning,automatic rank optimization between successive shots is achieved.What's more,considering the comprehensive effect of re-rank results and original results,the two are combined in a weighted way to achieve the maximum degree of optimization.Experimental results show that the proposed method effectively optimizes the P-S INS results.(3)P-S INS systemBased on the proposed methods in this paper,we design and implement a P-S INS system for TV series,which can efficiently search specific person,search specific scene,and specific persons appearing in specific scenes.In addition,search results can be displayed and saved in case of subsequent analysis and utilization,which has certain practical significance.In summary,this paper aims to conduct research for P-S INS.By analyzing the problems of perspective change and occlusion in the P-S INS and stability,approaches based on comprehensive feature representation and the similarity between neighboring shots are proposed in this paper,including person retrieval and scene retrieval methods based on comprehensive feature representation and video-based instance search method based on nearest neighbor weighted rank optimization.What's more,based on the related algorithms and techniques of this paper,a video instance search system is designed and implemented,which can effectively perform person and scene retrieval in TV series.
Keywords/Search Tags:Person Retrieval, Scene Retrieval, Instance Search, Comprehensive Representation, Rank Optimization, Video-based Instance Search System
PDF Full Text Request
Related items